https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81114

            Bug ID: 81114
           Summary: GNAT mishandles filenames with UTF8 chars on
                    case-insensitive filesystems
           Product: gcc
           Version: 8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: ada
          Assignee: unassigned at gcc dot gnu.org
          Reporter: simon at pushface dot org
  Target Milestone: ---
             Build: x86_64-apple-darwin16

Created attachment 41575
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41575&action=edit
Demonstrator (with BOM)

The attached demonstrator contains two files, each with a UTF8
BOM. One file, pack3_user.adb, contains

   with Páck3;
   procedure Pack3_User is
   begin
      null;
   end Pack3_User;

while the other, páck3.ads, contains just

   package Páck3 is
   end Páck3;

There is no problem compiling on Linux (Debian Jessie). However, on
Darwin and Windows, we get

   $ gnatmake -c -f pack3_user.adb
   gcc -c pack3_user.adb
   gnatmake: "p?ck3.ads" not found

This is perhaps partly explained by looking at pack3_user.ali:

====================
V "GNAT Lib v8"
M P W=8
P ZX

RN

U pack3_user%b          pack3_user.adb          be67fdbd NE OO SU
W pUe1ck3%s             p?ck3.ads               p?ck3.ali           [A]

D p?ck3.ads             20170615165452 7221d8b1 páck3%s             [B]
D pack3_user.adb        20170616143450 cc46250c pack3_user%b
D system.ads            20161018202953 085b6ffb system%s
X 1 páck3.ads                                                       [C]
[...]
====================

from which ([A], [B]) it is clear that GNAT is sometimes confused
about the file names.

Interestingly, sometimes it gets it right (last component on [B],
[C]).

The ALI file is written by Lib.Writ.Write_ALI. In two places it says

   if not File_Names_Case_Sensitive then
      Get_Name_String (Fname);
      To_Lower (Name_Buffer (1 .. Name_Len));    <<<<<<<<<
      Fname := Name_Find;
   end if;

which is clearly the Wrong Thing to do if the file name is not
ASCII. In the ALI file above, the small-a-acute, which should be
encoded as C3 A1, has been rendered as E3 A1.

Using the undocumented env var GNAT_FILE_NAME_CASE_SENSITIVE alters
things:

   $ GNAT_FILE_NAME_CASE_SENSITIVE=1 gnatmake -c -f pack3_user.adb
   gcc -c pack3_user.adb
   gcc -c páck3.ads

so it's clear that the problem lies in this region.

Interestingly, [B] and [C] above show that the compiler does
understand how to low-case extended characters in strings. I haven't
yet been able to find where this is done.

Reply via email to