I am in no way an expert on this. But, I won't let that stop me. It seems to me that the only solution is to use Unicode internally. Disgusting as it seems to those of us who are old enough to hoard bytes, we might want to consider using something other than UTF-8 for the internal representation. Using UTF-16 wouldn't be horrible but I recall that the Unicode folks made a botch of things so that one really needs 24 bits now, which really means using 32 internally.
The reason why I think that Unicode is appropriate is that it has been designed to be a superset of all other character sets. Being that the RFCs allow the mixing of character sets, Unicode allows them to be represented without having to encode "bank switching". I realize that doing this requires a library that does all of the Unicode character handling properly, which is not a trivial task. On the output side, we just have to do the best we can if characters in the input locale can't be represented in the output locale. This is independent of the internal representation. Jon _______________________________________________ Nmh-workers mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/nmh-workers
