On 26/03/2026 18:09, Paul Eggert wrote:
On 3/26/26 07:57, Bruno Haible wrote:
Its brittle because we asked the question "Which optimizations can we build
into Gnulib, so that programs that stick to the POSIX API get accelerated?"
We did not have the courage to switch from a POSIX API to a Gnulib-only API.
And the POSIX API, unfortunately, is based on a hidden static locale (that
is not even available as a 'locale_t').
That's indeed unfortunate. Can we get some of the benefit of a
Gnulib-only API by doing something like this on platforms like glibc
where it would be a performance win?
mbrtowc_func_t _gl_mbrtowc;
#define mbrtowc _gl_mbrtowc
and similarly for mbrtoc32? The idea would be to initialize _gl_mbrtowc
and _gl_mbrtoc32 near the start of each Coreutils etc. app, after the
locale is determined. This would be like the factory approach you
mentioned a while ago, but a bit less intrusive because it would need
just one setup function to be called near the stop.
Failing that, it sounds like we should look into a Gnulib-only API and
modify mcel.h etc. to use that API.
Yes being as unintrusive as possible would be great.
There are very few direct uses of mbrto{wc,32} in coreutils at least,
with wc being the most performance critical one.
Hiding complexity like this in mcel is a useful idea.
I know optimizing per character interfaces like this
is a little bit of premature optimization, as for perf
a different approach like buffer scanning it preferable.
But it does seem significant wins are available with simple adjustments.
cheers,
Padraig