Hi Ken,
> > (3) assume charset=utf-8 (maybe allow this to be overridden in
> > profile)
>
> We already do (1) and (2). (3) is the problem. Other people who have
> thoughts on this topic are free to weigh in. Personally, I believe
> that if you're doing LANG=C, you shouldn't be dealing with any 8-bit
> characters at all. Isn't that's what that means?
Agreed. I eventually moved from LC_ALL=C to LANG=en_GB.utf8 and it
isn't too painful these days. GNU grep and others have worked on the
performance hit they had initially and for those times when I do want,
e.g. sort(1), to be in the C locale I use
$ cat ~/bin/C
#! /bin/sh
# LC_ALL has precedence over LANG according to POSIX[1], but we may as
# well stamp out any traces by setting LANG too.
# 1. The Open Group Base Specifications, Ch. 8 Environment Variables.
LC_ALL=C LANG=C exec -- "$@"
$
BTW, WRT spotting multi-byte UTF-8 encoding, I don't think that's a
goer. Valid UTF-8 and valid GB2312 can share the same sequences,
especially if it's just the odd `£' or `拢` in ASCII text.
--
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy
_______________________________________________
Nmh-workers mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/nmh-workers