Pádraig Brady wrote: > $ yes áááááááááááááááááááá | head -n100000 > mbc.txt > $ yes 12345678901234567890 | head -n100000 > num.txt > > ===== Before ==== > > $ time src/wc -Lm < mbc.txt > 2100000 20 > real 0m0.186s > > $ time src/wc -m < mbc.txt > 2100000 > real 0m0.186s
> Now I see we may be replacing wcwidth() on OSX as there are issues > with OSX handling of combining characters in UTF-8. > So maybe the slow down is with the gnulib wcwidth!? > To test that I did: > > $ gl_cv_func_wcwidth_works=no ./configure --quiet > $ time src/wc -Lm < mbc.txt > 2100000 20 > real 0m0.225s When I do a profiling of this (on a glibc system, with gl_cv_func_wcwidth_works=no) using valgrind + kcachegrind, I obtain the attached output. My interpretation: * rpl_wcwidth is 2.5 times slower than the native glibc wcwidth. * uc_width in itself is OK; it's the locale_charset call which is eating churn. Which is silly, since the locale does not change while 'wc' is running. To improve this, it would be good if gnulib implemented a wcwidth_l function that takes a locale_t object as argument. The step from locale_t to the lookup table used by uc_width would be faster than the sequence of nl_langinfo_l and locale_charset. I'm not sure, though, that this can be realized: - the locale_t objects are not extensible. - #ifs are needed to accommodate platforms that don't have 'locale_t' at all. Bruno
callgrind.out.8022.png
Description: application/kcachegrind