Re: [PATCH] wcrtomb: fix CESU-8 value of leftover lone high surrogate

Christian Franke Mon, 30 Jun 2025 07:53:23 -0700

Corinna Vinschen wrote:

On Jun 29 19:13, Christian Franke wrote:

Fixes the CESU-8 value, but not the missing encoding if the high surrogate
is at the very end of the string.

Are you going to provide a patch for that issue?

Not very soon as this possibly requires non-trivial rework includingcomprehensive testing.

The function behind __WCTOMB() must also be called with the final L'\0'as input. This is not the case. For example in _wcstombs_r() only thesecond __WCTOMB() is called with L'\0'. The (s == NULL) part implicitlyassumes that it would only append '\0' and return 1.


newlib/libc/stdlib/wctomb_r.c:

size_t
_wcstombs_r (...)
{
  ...
  if (s == NULL)
    {
      ...
      while (*pwcs != 0)
        {
          bytes = __WCTOMB (r, buff, *pwcs++, state);
          ...
          num_bytes += bytes;
        }
        return num_bytes;
    }
  else
    {
      while (n > 0)
        {
          bytes = __WCTOMB (r, buff, *pwcs, state);
          ...
          if (*pwcs == 0x00)
            return ptr - s - (n >= bytes);
          ...
        }
        ...
    }
}

...

+      tmp = (((state->__value.__wchb[0] << 16 | state->__value.__wchb[1] << 8)
+           - 0x10000) >> 10) | 0xd800;
        *s++ = 0xe0 | ((tmp & 0xf000) >> 12);
        *s++ = 0x80 | ((tmp &  0xfc0) >> 6);
        *s++ = 0x80 |  (tmp &   0x3f);
--
2.45.1

LGTM, please push.


Done.

--
Thanks,
Christian

Re: [PATCH] wcrtomb: fix CESU-8 value of leftover lone high surrogate

Reply via email to