Hi Pádraig,

> I was doing some performance testing on cut(1) and noticed
> surprisingly slow per character iteration in cut -c1 (new code using 
> lib/mcel).
> Then I noticed the same performance issue with wc -m.
> This was only with non-ASCII chars as both wc and lib/mcel have
> shortcuts for ASCII, only deferring to mbrtoc32() for multi-byte.
> 
> Bruno you originally identified this inefficiency at:
> https://lists.gnu.org/r/bug-gnulib/2018-05/msg00173.html

But the proposed solution with factories (a factory that produces an
mbrtowc-like function for a given locale, and likewise for wcwidth)
never made it into Gnulib; it still sits on a local branch in my
gnulib checkout.

> I.e. that it's best to avoid glibc's mbrtowc() so we can
> use gnulib's cached dispatch version.
> 
> We do replace mbrtowc() on glibc always currently,
> but wc was changed to using mbrtoc32() in coreutils v9.4-37-g14d35d5ba
> which thus took the slower path since then I think.
> 
> 
> So in summary if I now ./configure ac_cv_func_mbrtowc=no
> (noting that coreutils already does
>   AC_DEFINE([GNULIB_WCHAR_SINGLE_LOCALE], [1]):
> I get faster wc -m:
> 
>    $ time src/wc-before -m mb.in
>    66060288 mb.in
>    real       0m2.717s
> 
>    $ time src/wc -m mb.in
>    66060288 mb.in
>    real       0m1.232s
> 
> 
> If I remove these lines from lib/mcel.h
> and also have the above configure var set
> I get faster cut -c:
> 
> -#ifdef __GLIBC__
> -# undef mbrtoc32
> -#endif
> 
>    $ time src/cut-before -c1 mb.in >/dev/null
>    real       0m1.589s
> 
>    $ time src/cut -c1 mb.in >/dev/null
>    real       0m0.626s
> 
> Paul it seems like we should not try to second guess the mbrtoc32 config,
> especially as the default can be so significantly slower?
> 
> 
> Now this is all quite brittle

Its brittle because we asked the question "Which optimizations can we build
into Gnulib, so that programs that stick to the POSIX API get accelerated?"
We did not have the courage to switch from a POSIX API to a Gnulib-only API.
And the POSIX API, unfortunately, is based on a hidden static locale (that
is not even available as a 'locale_t').

Bruno




Reply via email to