Re: performance bug of `wc -m` on simulated macOS

Bruno Haible Sun, 20 May 2018 13:57:42 -0700

Pádraig Brady wrote:
> $ yes áááááááááááááááááááá | head -n100000 > mbc.txt
> $ yes 12345678901234567890 | head -n100000 > num.txt
> 
> ===== Before ====
> 
> $ time src/wc -Lm < mbc.txt
> 2100000      20
> real    0m0.186s
> 
> $ time src/wc -m < mbc.txt
> 2100000
> real    0m0.186s


> Now I see we may be replacing wcwidth() on OSX as there are issues
> with OSX handling of combining characters in UTF-8.
> So maybe the slow down is with the gnulib wcwidth!?
> To test that I did:
> 
>   $ gl_cv_func_wcwidth_works=no ./configure --quiet
>   $ time src/wc -Lm < mbc.txt
>   2100000      20
>   real        0m0.225s

When I do a profiling of this (on a glibc system, with
gl_cv_func_wcwidth_works=no) using valgrind + kcachegrind,
I obtain the attached output.

My interpretation:

  * rpl_wcwidth is 2.5 times slower than the native glibc wcwidth.

  * uc_width in itself is OK; it's the locale_charset call which
    is eating churn. Which is silly, since the locale does not change
    while 'wc' is running.

    To improve this, it would be good if gnulib implemented a
    wcwidth_l function that takes a locale_t object as argument.
    The step from locale_t to the lookup table used by uc_width
    would be faster than the sequence of nl_langinfo_l and locale_charset.
    I'm not sure, though, that this can be realized:
      - the locale_t objects are not extensible.
      - #ifs are needed to accommodate platforms that don't have 'locale_t'
        at all.

Bruno

callgrind.out.8022.png
Description: application/kcachegrind

Re: performance bug of `wc -m` on simulated macOS

Reply via email to