Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects

Ken Hornstein Wed, 18 Jun 2014 06:22:00 -0700

>> That's not universally true anymore.  Some newer filesystems are
>> mandating that filenames are UTF-8 and enforcing normalization rules
>> (MacOS X and Solaris are two notable examples).
>
>Thanks, I didn't know.  Haven't used Solaris in years, and never bought
>Apple.


Let me amend this a bit; as I understand it, you have to enable that
behavior on Solaris.  It's the default behavior on MacOS X.

>> Solaris is better; the original bytes are preserved, but lookup is
>> done using normalized names so you can't have two filenames with the
>> same characters.
>
>What about globbing, especially on Mac OS X?  Given your two examples on
>Linux with bash,
>[...]

So, clearly we need some userspace support.  AFAIK, the globbing isn't
Unicode-aware; it's just matching on whatever readdir() returns.  Should
a ? match on a byte?  A Unicode codepoint?  An abstract character?  I am
not sure, and I am not sure if anyone has decided on this from a standards
point of view.

>Do you think NFKC would be better, so ? often matches what appears as a
>single rune and fi matches ligature ﬁ?

Hm.  I believe some network filesystems use NFKC, but I am neutral on
what should be done.  Should fi match ﬁ?  I cannot decide; I see
arguments for both.

--Ken

_______________________________________________
Nmh-workers mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] Non-ASCII Characters in bodies and subjects

Reply via email to