Pádraig Brady <[email protected]> writes:

> On 18/10/2025 22:05, Collin Funk wrote:
>> Pádraig Brady <[email protected]> writes:
>> 
>>> There were various other multi-byte blanks issues,
>>> and multi-byte issues in general when I looked further.
>>>
>>> The attached 3 further patches should make numfmt fully support multi-byte.
>> numfmt is a nice case where we don't need to optimize MB_CUR_MAX ==
>> 1,
>> thanks.
>
> Right. But that got me thinking that we could optimize
> in various cases, rather than resorting to mbsstr().
> The attached implements mbsmbchr(mbs, mbc) to more efficiently
> search for a multi-byte char in a multi-byte string,
> especially with the usual UTF-8 charset
> (which is determined with a single call to mbrtoc32() call per process).

I wonder if that function is worth putting in gl/ under LGPL in case we
want to use it in other programs and/or move it to Gnulib. It seems
useful to me.

> +      mbstate_t mbstate = {0,};

The following is slightly more efficient:

    mbstate_t mbstate; mbszero (&mbstate);

> +      is_utf8 = mbrtoc32 (&w, "\xe2\x9f\xb8", 3, &mbstate) == 3 && w == 
> 0x27F8;

You might want to copy the test from lib/quotearg.c instead, for
consistency:

   /* snipped text...
     If the current encoding is consistent with UTF-8 for U+2018,
     assume that the locale uses UTF-8.  This is safe in practice,
     and means we need not use a function like locale_charset that
     has other dependencies.  */
  static char const quote[][4] = { "\xe2\x80\x98", "\xe2\x80\x99" };
  char32_t w;
  mbstate_t mbs; mbszero (&mbs);
  if (mbrtoc32 (&w, quote[0], 3, &mbs) == 3 && w == 0x2018)
    return quote[msgid[0] == '\''];

Collin

Reply via email to