I was doing some performance testing on cut(1) and noticed surprisingly slow per character iteration in cut -c1 (new code using lib/mcel). Then I noticed the same performance issue with wc -m. This was only with non-ASCII chars as both wc and lib/mcel have shortcuts for ASCII, only deferring to mbrtoc32() for multi-byte.
Bruno you originally identified this inefficiency at: https://lists.gnu.org/r/bug-gnulib/2018-05/msg00173.html I.e. that it's best to avoid glibc's mbrtowc() so we can use gnulib's cached dispatch version. We do replace mbrtowc() on glibc always currently, but wc was changed to using mbrtoc32() in coreutils v9.4-37-g14d35d5ba which thus took the slower path since then I think. So in summary if I now ./configure ac_cv_func_mbrtowc=no (noting that coreutils already does AC_DEFINE([GNULIB_WCHAR_SINGLE_LOCALE], [1]): I get faster wc -m: $ time src/wc-before -m mb.in 66060288 mb.in real 0m2.717s $ time src/wc -m mb.in 66060288 mb.in real 0m1.232s If I remove these lines from lib/mcel.h and also have the above configure var set I get faster cut -c: -#ifdef __GLIBC__ -# undef mbrtoc32 -#endif $ time src/cut-before -c1 mb.in >/dev/null real 0m1.589s $ time src/cut -c1 mb.in >/dev/null real 0m0.626s Paul it seems like we should not try to second guess the mbrtoc32 config, especially as the default can be so significantly slower? Now this is all quite brittle, but I'm unsure of the best fix. For coreutils configure things automatically ensuring that both mbrtowc and mbrtoc32 use the gnulib replacement/cached dispatch. Note these per char interfaces are best avoided anyway, but until we do that we might as well get significant wins from the code that's already in place. thanks, Padraig
