I was doing some performance testing on cut(1) and noticed
surprisingly slow per character iteration in cut -c1 (new code using lib/mcel).
Then I noticed the same performance issue with wc -m.
This was only with non-ASCII chars as both wc and lib/mcel have
shortcuts for ASCII, only deferring to mbrtoc32() for multi-byte.

Bruno you originally identified this inefficiency at:
https://lists.gnu.org/r/bug-gnulib/2018-05/msg00173.html
I.e. that it's best to avoid glibc's mbrtowc() so we can
use gnulib's cached dispatch version.

We do replace mbrtowc() on glibc always currently,
but wc was changed to using mbrtoc32() in coreutils v9.4-37-g14d35d5ba
which thus took the slower path since then I think.


So in summary if I now ./configure ac_cv_func_mbrtowc=no
(noting that coreutils already does
 AC_DEFINE([GNULIB_WCHAR_SINGLE_LOCALE], [1]):
I get faster wc -m:

  $ time src/wc-before -m mb.in
  66060288 mb.in
  real  0m2.717s

  $ time src/wc -m mb.in
  66060288 mb.in
  real  0m1.232s


If I remove these lines from lib/mcel.h
and also have the above configure var set
I get faster cut -c:

-#ifdef __GLIBC__
-# undef mbrtoc32
-#endif

  $ time src/cut-before -c1 mb.in >/dev/null
  real  0m1.589s

  $ time src/cut -c1 mb.in >/dev/null
  real  0m0.626s

Paul it seems like we should not try to second guess the mbrtoc32 config,
especially as the default can be so significantly slower?


Now this is all quite brittle, but I'm unsure of the best fix.
For coreutils configure things automatically ensuring that
both mbrtowc and mbrtoc32 use the gnulib replacement/cached dispatch.

Note these per char interfaces are best avoided anyway,
but until we do that we might as well get significant wins
from the code that's already in place.

thanks,
Padraig

Reply via email to