Re: [Nmh-workers] nmh architecture discussion: format engine character set

Ken Hornstein Mon, 10 Aug 2015 10:07:08 -0700

>> So ... what would that mean, exactly?  Ignore the locale setting and
>> always output UTF-8?
>
>Well, yes, the code would be writing UTF-8, with the knowledge of how
>many cells have been occupied, e.g. one for the combining `a⃞', but it
>could complain about the non-UTF-8 locale setting, or try and set up
>`fire and forget' converter on open and opening files if it was easy
>enough to be worth the bother.


Help me out here, because I'm trying to translate your concepts into
actual code and I'm having some problems seeing how it would work.

Assuming we don't bring in a library like ICU, it's difficult for us
to reliably determine the width of a Unicode character.  Specifically:

- The POSIX standard functions for this, wcwidth() and wcswidth(), work
  on the current locale, which is not guaranteed to support UTF-8 (or
  even support 8-bit characters).

- The xlocale functions which allow one to specify a specific a locale
  to functions like wcwidth() are not part of POSIX.

- Even if we used xlocale (or just overrode the global locale in every
  nmh program) it turns out there's not a reliable UTF-8 compatible
  default we can use; we ran into this in the test suite, some people
  just don't install all of the locales, so we can't assume en_US.UTF-8
  (or en_GB.UTF-8, or whatever).

I'm unclear how you wnated to use the iconv utility; is the idea just
output everything in UTF-8 and run iconv as a filter for all text
output?  I think that might have unintended consequences, but putting
that aside there are other issues.  For one, iconv can't do character
substitution on conversion failure (at least the POSIX iconv cannot; I
am aware that GNU iconv can).  Even if it can, I am unsure we can maintain
the correct column position when dealing with things like combining
characters.

But hey, if I'm wrong I'd be glad to hear about it.  I think it's a much
tougher problem than people realize.

--Ken

_______________________________________________
Nmh-workers mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [Nmh-workers] nmh architecture discussion: format engine character set

Reply via email to