Hello Ken,
> > The Unix kernel stores filenames as a run of bytes, not including
> > `/' and NUL.
>
> That's not universally true anymore. Some newer filesystems are
> mandating that filenames are UTF-8 and enforcing normalization rules
> (MacOS X and Solaris are two notable examples).
Thanks, I didn't know. Haven't used Solaris in years, and never bought
Apple.
> The only way of resolving this is to use the normalization rules for
> Unicode and do filename searching that way;
Sure.
> MacOS X actually rewrites all of the filenames using Normalization
> Form D (all characters in decomposed form, which means the regular
> character followed by the combining accents) and I think that sucks,
> but they didn't ask me.
I think I agree with you.
> Solaris is better; the original bytes are preserved, but lookup is
> done using normalized names so you can't have two filenames with the
> same characters.
What about globbing, especially on Mac OS X? Given your two examples on
Linux with bash,
$ touch résumé résumé
$ ls r?sum?
résumé
$ ls r?sum? | recode ..dump
UCS2 Mne Description
0072 r latin small letter r
00E9 e' latin small letter e with acute
0073 s latin small letter s
0075 u latin small letter u
006D m latin small letter m
00E9 e' latin small letter e with acute
000A LF line feed (lf)
$
$ ls r??sum??
résumé
$
Do you think NFKC would be better, so ? often matches what appears as a
single rune and fi matches ligature fi?
Cheers, Ralph.
_______________________________________________
Nmh-workers mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/nmh-workers