On 26/03/2026 20:06, Paul Eggert wrote:
On 3/26/26 13:03, Pádraig Brady wrote:
Well I'm not too sure on all this,
but the replacement seems due to mbrtowc and mbrtoc32 returning EILSEQ
in the C locale
(as mentioned in the comment above the 3 lines referenced above in lib/
mcel.h
OK; does simply removing the "#undef mbrtoc32" from mcel.h give the
performance boost? Or do you also need to tell 'configure' that mbrtowc
is missing? And if the latter, why don't you also need to tell
'configure' that mbrtoc32 is missing?
Yes that's confusing. I hadn't analyzed it,
but yes you also need to force ac_cv_func_mbrtowc=no.
Looking now, I see that mcel uses mbrtoc32 which then calls
into the system mbrtoc32.
When you configure with ac_cv_func_mbrtowc=no if falls through
to lib/mbrtowc-impl.h (and hence the efficient dispatch etc.)
I.e. the #if GNULIB_defined_mbstate_t /* AIX */ case is used,
allowing for the faster dispatch.
With ./configure ac_cv_func_mbrtowc=no we get the following (and faster
operation):
HAVE_MBRTOC16='1'
HAVE_MBRTOC32='1'
HAVE_MBRTOWC='0'
REPLACE_MBRTOC16='0'
REPLACE_MBRTOC32='1'
REPLACE_MBRTOWC='1'
It sounds like we need to modify m4/mbrtowc.m4 so that it replaces
mbrtwoc on glibc. Maybe similarly for mbrtoc32?
Yes it's confusing at least.
The setup of the various mbrto.. replacements is a bit intricate.
In the attached I adjusted things so that the efficient
dispatch routines are used once the wchar-single module is referenced.
I'm not sure about this approach, but it works with coreutils
on glibc-2.43 at least, and cut -c (mcel) is 2.6x faster,
and wc -m (mbrtoc32) is 2x faster.
cheers,
Padraig
From 90cd713224c3326240a234cbc10afe307dc591ab Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <[email protected]>
Date: Thu, 26 Mar 2026 23:46:09 +0000
Subject: [PATCH] wchar-single: ensure uses efficient dispatch used
* lib/lc-charset-dispatch.c: Also use with GNULIB_WCHAR_SINGLE_LOCALE.
* lib/lc-charset-dispatch.h: Likewise.
* lib/mbrtoc32.c: Likewise.
* lib/mbrtowc.c: Likewise.
* lib/mcel.h: Ensure we use the replaced mbrtoc32.
* modules/wchar-single: Ensure we link the dispatch routines.
---
lib/lc-charset-dispatch.c | 2 +-
lib/lc-charset-dispatch.h | 2 +-
lib/mbrtoc32.c | 2 +-
lib/mbrtowc.c | 2 +-
lib/mcel.h | 7 -------
modules/wchar-single | 1 +
6 files changed, 5 insertions(+), 11 deletions(-)
diff --git a/lib/lc-charset-dispatch.c b/lib/lc-charset-dispatch.c
index 09f83f1887..5f199ce453 100644
--- a/lib/lc-charset-dispatch.c
+++ b/lib/lc-charset-dispatch.c
@@ -21,7 +21,7 @@
/* Specification. */
#include "lc-charset-dispatch.h"
-#if GNULIB_defined_mbstate_t
+#if GNULIB_WCHAR_SINGLE_LOCALE || GNULIB_defined_mbstate_t
# include "localcharset.h"
# include "streq-opt.h"
diff --git a/lib/lc-charset-dispatch.h b/lib/lc-charset-dispatch.h
index 665da40855..fc1366a0c9 100644
--- a/lib/lc-charset-dispatch.h
+++ b/lib/lc-charset-dispatch.h
@@ -18,7 +18,7 @@
#include <wchar.h>
-#if GNULIB_defined_mbstate_t
+#if GNULIB_WCHAR_SINGLE_LOCALE || GNULIB_defined_mbstate_t
/* A classification of special values of the encoding of the current locale. */
typedef enum
diff --git a/lib/mbrtoc32.c b/lib/mbrtoc32.c
index f3570f73ff..b4cf353a70 100644
--- a/lib/mbrtoc32.c
+++ b/lib/mbrtoc32.c
@@ -30,7 +30,7 @@
# include "lc-charset-unicode.h"
#endif
-#if GNULIB_defined_mbstate_t /* AIX */
+#if GNULIB_WCHAR_SINGLE_LOCALE || GNULIB_defined_mbstate_t
/* Implement mbrtoc32() on top of mbtowc() for the non-UTF-8 locales
and directly for the UTF-8 locales. */
diff --git a/lib/mbrtowc.c b/lib/mbrtowc.c
index 4983170688..7a32397a4a 100644
--- a/lib/mbrtowc.c
+++ b/lib/mbrtowc.c
@@ -20,7 +20,7 @@
/* Specification. */
#include <wchar.h>
-#if GNULIB_defined_mbstate_t
+#if GNULIB_WCHAR_SINGLE_LOCALE || GNULIB_defined_mbstate_t
/* Implement mbrtowc() on top of mbtowc() for the non-UTF-8 locales
and directly for the UTF-8 locales. */
diff --git a/lib/mcel.h b/lib/mcel.h
index 757a97593f..5eedd5b610 100644
--- a/lib/mcel.h
+++ b/lib/mcel.h
@@ -217,13 +217,6 @@ mcel_isbasic (char c)
return _GL_LIKELY (0 <= c && c < MCEL_ERR_MIN);
}
-/* With mcel there should be no need for the performance overhead of
- replacing glibc mbrtoc32, as callers shouldn't care whether the
- C locale treats a byte with the high bit set as an encoding error. */
-#ifdef __GLIBC__
-# undef mbrtoc32
-#endif
-
/* Scan bytes from P inclusive to LIM exclusive. P must be less than LIM.
Return the character or encoding error starting at P. */
MCEL_INLINE mcel_t
diff --git a/modules/wchar-single b/modules/wchar-single
index 6545c83bde..dedc3d3ead 100644
--- a/modules/wchar-single
+++ b/modules/wchar-single
@@ -13,6 +13,7 @@ AC_DEFINE([GNULIB_WCHAR_SINGLE_LOCALE], [1],
where we know the locale charset will not change between calls.])
dnl For backward compatibility:
gl_MODULE_INDICATOR([wchar-single])
+AC_LIBOBJ([lc-charset-dispatch])
Makefile.am:
--
2.53.0