On Sat, Jan 17, 2026 at 09:28:30AM +0200, Eli Zaretskii wrote:
> If the above doesn't produce any problems (except with occasional wide
> characters), then it's an easy solution, I think. And we could even
> do better if in the "UTF-8 initial byte" clause we compute the Unicode
> codepoint of the character and call wcwidth (which on Windows will
> call the Gnulib wcwidth and on other systems will DTRT since the above
> code should only be used when the locale's codeset is UTF-8).
>
> > We could add an Info variable to customize this behaviour.
>
> That'd be great, thanks. Would it be possible to add that in this
> release?
Here's a more finished patch. It would be fine to include this in the
next release if you can confirm that it works acceptably.
wcwidth takes a wchar_t argument and we can't guarantee the format of
this type. Moreover, in info/pcterm.c, we redefine wcwidth as there
was a performance issue with calling the gnulib definition. Reading
the UTF-8 sequence, obtaining the codepoint and calling wcwidth seems
to me to be a unnecessary complication for a marginal use case.
diff --git a/info/display.c b/info/display.c
index 4df6a45063..6c71bd9799 100644
--- a/info/display.c
+++ b/info/display.c
@@ -482,6 +482,8 @@ display_process_line (WINDOW *win,
static struct text_buffer printed_rep = { 0 };
+int raw_utf8_output_p = 0;
+
/* Return pointer to string that is the printed representation of character
(or other logical unit) at ITER if it were printed at screen column
PL_CHARS. Use ITER_SETBYTES (util.h) on ITER if we need to advance
@@ -501,7 +503,38 @@ printed_representation (mbi_iterator_t *iter, int *delim,
size_t pl_chars,
text_buffer_reset (&printed_rep);
- if (mb_isprint (mbi_cur (*iter)))
+ if (raw_utf8_output_p && (unsigned char) *cur_ptr >= 0x80)
+ {
+ /* For systems without a working UTF-8 locale but where UTF-8
+ actually works on the terminal. This may happen in an MS-Windows
+ UTF-8 terminal with the MSVCRT run-time.
+
+ Pass through UTF-8 bytes to the terminal. Count each character as
+ a single screen column. This at least allows viewing (mostly
+ correctly) non-ASCII characters in UTF-8 Info files.
+
+ Searching, user entry etc. of non-ASCII characters may still
+ not work correctly. */
+
+ unsigned char c = *cur_ptr;
+ if ((c & 0xc0) == 0xc0)
+ {
+ /* UTF-8 initial byte. */
+ *pchars = 1;
+ *pbytes = 1;
+ ITER_SETBYTES (*iter, 1);
+ return cur_ptr;
+ }
+ if ((c & 0xc0) == 0x80)
+ {
+ /* UTF-8 continuation byte. */
+ *pchars = 0;
+ *pbytes = 1;
+ ITER_SETBYTES (*iter, 1);
+ return cur_ptr;
+ }
+ }
+ else if (mb_isprint (mbi_cur (*iter)))
{
/* cur.wc gives a wchar_t object. See mbiter.h in the
gnulib/lib directory. */
diff --git a/info/variables.c b/info/variables.c
index b6d4371de7..e91869ff57 100644
--- a/info/variables.c
+++ b/info/variables.c
@@ -164,6 +164,10 @@ VARIABLE_ALIST info_variables[] = {
N_("How to print the information line at the start of a node"),
CHOICES_VAR(nodeline_print, nodeline_choices) },
+ { "raw-utf8-output",
+ N_("Always pass through non-ASCII UTF-8 bytes in files to terminal"),
+ ON_OFF_VAR(raw_utf8_output_p) },
+
{ NULL }
};
diff --git a/info/variables.h b/info/variables.h
index 5454ab942e..03d263c6a2 100644
--- a/info/variables.h
+++ b/info/variables.h
@@ -79,6 +79,7 @@ extern int key_time;
extern int mouse_protocol;
extern int follow_strategy;
extern int nodeline_print;
+extern int raw_utf8_output_p;
typedef struct {
unsigned long mask;