https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81114
Bug ID: 81114 Summary: GNAT mishandles filenames with UTF8 chars on case-insensitive filesystems Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ada Assignee: unassigned at gcc dot gnu.org Reporter: simon at pushface dot org Target Milestone: --- Build: x86_64-apple-darwin16 Created attachment 41575 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41575&action=edit Demonstrator (with BOM) The attached demonstrator contains two files, each with a UTF8 BOM. One file, pack3_user.adb, contains with Páck3; procedure Pack3_User is begin null; end Pack3_User; while the other, páck3.ads, contains just package Páck3 is end Páck3; There is no problem compiling on Linux (Debian Jessie). However, on Darwin and Windows, we get $ gnatmake -c -f pack3_user.adb gcc -c pack3_user.adb gnatmake: "p?ck3.ads" not found This is perhaps partly explained by looking at pack3_user.ali: ==================== V "GNAT Lib v8" M P W=8 P ZX RN U pack3_user%b pack3_user.adb be67fdbd NE OO SU W pUe1ck3%s p?ck3.ads p?ck3.ali [A] D p?ck3.ads 20170615165452 7221d8b1 páck3%s [B] D pack3_user.adb 20170616143450 cc46250c pack3_user%b D system.ads 20161018202953 085b6ffb system%s X 1 páck3.ads [C] [...] ==================== from which ([A], [B]) it is clear that GNAT is sometimes confused about the file names. Interestingly, sometimes it gets it right (last component on [B], [C]). The ALI file is written by Lib.Writ.Write_ALI. In two places it says if not File_Names_Case_Sensitive then Get_Name_String (Fname); To_Lower (Name_Buffer (1 .. Name_Len)); <<<<<<<<< Fname := Name_Find; end if; which is clearly the Wrong Thing to do if the file name is not ASCII. In the ALI file above, the small-a-acute, which should be encoded as C3 A1, has been rendered as E3 A1. Using the undocumented env var GNAT_FILE_NAME_CASE_SENSITIVE alters things: $ GNAT_FILE_NAME_CASE_SENSITIVE=1 gnatmake -c -f pack3_user.adb gcc -c pack3_user.adb gcc -c páck3.ads so it's clear that the problem lies in this region. Interestingly, [B] and [C] above show that the compiler does understand how to low-case extended characters in strings. I haven't yet been able to find where this is done.