I've tried to run the Info reader from Texinfo 7.2.90 after setting
the console encoding to use UTF-8 (a.k.a. "codepage 65001"), showing
the ELisp Reference manual (which uses UTF-8 to encode non-ASCII
characters).  It didn't work as I expected: the UTF-8 sequences were
shown as raw bytes instead of Unicode characters.

AFAICT, this happens because display.c:printed_representation doesn't
regognize UTF-8 byte sequences as such, and instead decides that they
are just raw bytes.  And that led me to this fragment:

  const char *
  printed_representation (mbi_iterator_t *iter, int *delim, size_t pl_chars,
                          int *pchars, int *pbytes)
  {
    struct text_buffer *rep = &printed_rep;

    char *cur_ptr = (char *) mbi_cur_ptr (*iter);
    int cur_len = mb_len (mbi_cur (*iter));

    text_buffer_reset (&printed_rep);

    if (mb_isprint (mbi_cur (*iter)))

This uses multibyte iteration and functions/macros like mb_len and
mb_isprint from Gnulib, and they evidently don't recognize UTF-8
encoding in this case.  I didn't have time to look closely enough at
the implementation (which is quite complex, and seems to use 32-bit
Unicode codepoints and various functions that replace(?) the likes of
mbrlen and mbrtowc), but it seems to still use mbstate_t as declared
on the system headers.

So my question to Bruno is: do the above functions/macros rely on
mbrlen and mbrtowc from the Windows C runtime, or do they replace them
from the ground up?

I'm asking because AFAIK these functions as implemented in the legacy
MSVCRT run-time library don't support UTF-8 encoding, only the newer
UCRT runtime does.  And even UCRT only supports UTF-8 when the system
locale was set to something.UTF-8; just setting the terminal's
encoding to UTF-8 is not enough.  By contrast, I would like the Info
reader to be capable of UTF-8 output when it runs on a Windows
terminal whose encoding is UTF-8, and I'd like to be able to support
this both in the MSVCRT and UCRT builds of the Info reader.  So if
Gnulib replaces the above functions with its own (or can potentially
replace them, given some build-time knobs), I'd like to try that
and/or fix the code involved to pay attention to the terminal's
encoding, not just to the current system locale, if that's possible.

Bruno?

  • MinG... Eli Zaretskii
    • ... Bruno Haible via Bug reports for the GNU Texinfo documentation system
      • ... Eli Zaretskii
        • ... Bruno Haible via Bug reports for the GNU Texinfo documentation system
          • ... Eli Zaretskii
            • ... Bruno Haible via Bug reports for the GNU Texinfo documentation system

Reply via email to