>> That's not universally true anymore. Some newer filesystems are >> mandating that filenames are UTF-8 and enforcing normalization rules >> (MacOS X and Solaris are two notable examples). > >Thanks, I didn't know. Haven't used Solaris in years, and never bought >Apple.
Let me amend this a bit; as I understand it, you have to enable that behavior on Solaris. It's the default behavior on MacOS X. >> Solaris is better; the original bytes are preserved, but lookup is >> done using normalized names so you can't have two filenames with the >> same characters. > >What about globbing, especially on Mac OS X? Given your two examples on >Linux with bash, >[...] So, clearly we need some userspace support. AFAIK, the globbing isn't Unicode-aware; it's just matching on whatever readdir() returns. Should a ? match on a byte? A Unicode codepoint? An abstract character? I am not sure, and I am not sure if anyone has decided on this from a standards point of view. >Do you think NFKC would be better, so ? often matches what appears as a >single rune and fi matches ligature fi? Hm. I believe some network filesystems use NFKC, but I am neutral on what should be done. Should fi match fi? I cannot decide; I see arguments for both. --Ken _______________________________________________ Nmh-workers mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/nmh-workers
