>> if not for file names? > >The Unix kernel stores filenames as a run of bytes, not including `/' >and NUL.
That's not universally true anymore. Some newer filesystems are mandating that filenames are UTF-8 and enforcing normalization rules (MacOS X and Solaris are two notable examples). Obviously some charset conversion is happening for non-UTF-8 locales. I think that's inevitable, given the issues with composed and decomposed characters. For example, let's say you see this: % ls Résumé.txt Résumé.txt How can that be? Well, they aren't the same sequence of bytes. In the first one the “é” is U+00E9. In the second, it's U+0065 U+0301 (a regular “e” followed by a combining accent character). The only way of resolving this is to use the normalization rules for Unicode and do filename searching that way; MacOS X actually rewrites all of the filenames using Normalization Form D (all characters in decomposed form, which means the regular character followed by the combining accents) and I think that sucks, but they didn't ask me. Solaris is better; the original bytes are preserved, but lookup is done using normalized names so you can't have two filenames with the same characters. --Ken _______________________________________________ Nmh-workers mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/nmh-workers
