>> So ... what would that mean, exactly? Ignore the locale setting and >> always output UTF-8? > >Well, yes, the code would be writing UTF-8, with the knowledge of how >many cells have been occupied, e.g. one for the combining `a⃞', but it >could complain about the non-UTF-8 locale setting, or try and set up >`fire and forget' converter on open and opening files if it was easy >enough to be worth the bother.
Help me out here, because I'm trying to translate your concepts into actual code and I'm having some problems seeing how it would work. Assuming we don't bring in a library like ICU, it's difficult for us to reliably determine the width of a Unicode character. Specifically: - The POSIX standard functions for this, wcwidth() and wcswidth(), work on the current locale, which is not guaranteed to support UTF-8 (or even support 8-bit characters). - The xlocale functions which allow one to specify a specific a locale to functions like wcwidth() are not part of POSIX. - Even if we used xlocale (or just overrode the global locale in every nmh program) it turns out there's not a reliable UTF-8 compatible default we can use; we ran into this in the test suite, some people just don't install all of the locales, so we can't assume en_US.UTF-8 (or en_GB.UTF-8, or whatever). I'm unclear how you wnated to use the iconv utility; is the idea just output everything in UTF-8 and run iconv as a filter for all text output? I think that might have unintended consequences, but putting that aside there are other issues. For one, iconv can't do character substitution on conversion failure (at least the POSIX iconv cannot; I am aware that GNU iconv can). Even if it can, I am unsure we can maintain the correct column position when dealing with things like combining characters. But hey, if I'm wrong I'd be glad to hear about it. I think it's a much tougher problem than people realize. --Ken _______________________________________________ Nmh-workers mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/nmh-workers
