Hello, > I think, GNU Global should also have a kind of normalization layer to > accept both normalization forms as input.
Rather, isn't this a bug in 'TextEdit'? It seems that TextEdit converts a character: 'ä'(c3a4) into two characters: 'a'(61) + ' ̈'(cc88) of the file name. This doesn't make sense, since you can use 'ä'(c3a4) directly as part of the file name on APFS (I have tested on macOS 10.15.7). Now, I think the conversion is a bug if there is no reason. What do you think? Regards, Shigio On Sat, Jun 26, 2021 at 10:13 PM Bernd Rellermeyer <[email protected]> wrote: > > I have the following problem with Unicode file names on macOS. > > On macOS with an APFS file system, file names are Unicode encoded, but > not normalized. That means that on the file system, file names can be > either in normalization form C or in normalization form D. Most > applications have a normalization layer and use NFC internally, but save > files in NFD. Apparently this is not the case for some command line > applications like GNU Global. Open the macOS TextEdit application, type > ``#define nfd;`` and save the file as a plain text file with name > ``ä.c`` in an empty directory. Now change in that directory on the > command line and enter ``gtags`` and ```global -f ä.c``. GNU Global > tells you that ``ä.c`` is not a source file. Now enter ``ls`` on the > command line, copy the file name from the output and paste it as an > argument to ``global -f``. This time the file is found by GNU Global. > What happens is that the file name is in NFD on the file system, but in > NFC when typing the file name on the command line. When copying the > output of ``ls ä.c`` as an argument to ``global -f``, GNU Global again > tells you that ``ä.c`` is not a source file. The ``ls`` command > apparently has a normalization layer that makes it accept file names in > either normalization form as input. The ``ls`` command with no > arguments prints file names in their normalization form on the file > system, whereas typing ``ls ä.c`` on the command line prints the file > name in NFC, independent of its normalization form on the file system. > On the command line itself, files are saved with file names in NFC. > When typing ``echo "#define nfc;" > ä.c``, ``gtags`` and ``global -f > ä.c`` on the command line, the file name is in NFC on the file system. > > I think, GNU Global should also have a kind of normalization layer to > accept both normalization forms as input. > > Kind regards > > Bernd Rellermeyer > > -- Shigio YAMAGUCHI <[email protected]> PGP fingerprint: 26F6 31B4 3D62 4A92 7E6F 1C33 969C 3BE3 89DD A6EB
