On 02/04/2026 21:25, Bruno Haible wrote:
Paul Eggert wrote:
I'm a little lost here. Why do we always replace mbrtowc on glibc? And
why is mbrtoc32 not also replaced on glibc?

Gnulib overrides mbrtowc on glibc:
   REPLACE_MBRTOWC=1
   because gl_cv_func_mbrtowc_C_locale_sans_EILSEQ=no / 
MBRTOWC_IN_C_LOCALE_MAYBE_EILSEQ=1

Gnulib also overrides mbrtoc32 on glibc:
   REPLACE_MBRTOC32=1
   because gl_cv_func_mbrtoc32_C_locale_sans_EILSEQ=no / 
MBRTOC32_IN_C_LOCALE_MAYBE_EILSEQ=1

We're still relying on the above for this optimization to be enabled,
but that's unlikely to change I think, so that's probably fine.
If it was easy for the wchar-single module to ensure the replacement
of these functions that would be good I think. No worries either way.

Pádraig Brady wrote:
In the attached I adjusted things so that the efficient
dispatch routines are used once the wchar-single module is referenced.
I'm not sure about this approach

This patch has a major problem: it reuses the code path meant for AIX, that
requires
   1. overriding mbstate_t,
   2. locking around mbtowc() calls for non-UTF-8 locales.

Instead, the intended speedup — on glibc systems, in an UTF-8 locale — can
be obtained by inlining glibc compatible code for mbrtowc/mbrtoc32 specialized
to UTF-8.

I'm committing the attached two patches. They don't cause test failures in
coreutils.

but it works with coreutils
on glibc-2.43 at least, and cut -c (mcel) is 2.6x faster,
and wc -m (mbrtoc32) is 2x faster.

My test case is
   $ time src/wc -m mb10000.in
where mb10000.in is attached.

I observe that it is 2x faster. The profiling (attached, done with gprofng-gui,
see https://gitlab.com/ghwiki/gnow-how/-/wikis/Profiling/with_sampling )
shows that the mbrtoc32 time is reduced from 3.83 sec to 1.49 sec.

I can't observe a speedup on 'cut -c1 mb10000.in' because 'cut' does not
operate on multibyte characters:
   - option 'c' is equivalent to option 'b',
   - the profiling of function cut_bytes shows no multibyte stuff invocation.
Maybe you are using a modified 'cut' program? Or, can you attach your input
file (compressed)?

I'm not committing the proposed change to lib/mcel.h, because I don't have
a test case where it would make a difference. If you have one, please show it.

Yes I'm working on multi-byte cut which I'll probably push
in the next day or so. It's currently at:

  $ git clone https://github.com/pixelb/coreutils.git
  $ git checkout cut-mb

Anyway I tested your change and it works really well.
I need to remove the '#undef mbrtoc32' from mcel.h to
get the win there of course.  Again I get the same 2.6x win
as seen with my previous patch:

  $ yes $(yes éééááé | head -n9 | paste -s -d,) |
    head -n1M > mb.in

  $ time LC_ALL=C.UTF-8 src/cut-before -c1 mb.in >/dev/null
  real    0m1.582s

  $ time LC_ALL=C.UTF-8 src/cut-after -c1 mb.in >/dev/null
  real    0m0.592s
thank you!
Padraig

Reply via email to