On 18/05/18 14:06, Eric Fischer wrote: > For whatever it's worth, the system wcwidth seems to be much faster on my > MacOS X system (10.11.6) than the replacement wcwidth. Using the same > benchmark as above, it takes about 0.9 seconds with the replacement wcwidth: > > $ yes áááááááááááááááááááá | head -n100000 > mbc.txt > $ yes 12345678901234567890 | head -n100000 > num.txt > > $ time src/wc -Lm < mbc.txt > 2100000 20 > real0m1.004s > > $ time src/wc -m < mbc.txt > 2100000 > real0m0.909s > > $ time src/wc -Lm < num.txt > 2100000 20 > real0m0.903s > > $ time src/wc -m < num.txt > 2100000 > real0m0.887s > > and about 0.03 or 0.09 seconds with the system wcwidth (tested by adding > return wcwidth (wc); to the top of the lib/wcwidth.c replacement): > > $ time src/wc -Lm < mbc.txt > 2100000 20 > real0m0.098s > > $ time src/wc -m < mbc.txt > 2100000 > real0m0.088s > > $ time src/wc -Lm < num.txt > 2100000 20 > real0m0.038s > > $ time src/wc -m < num.txt > 2100000 > real0m0.032s > > Unfortunately the replacement wcwidth is probably necessary for correct text > measuring. The original MacOS X 10.3 bug where COMBINING ACUTE ACCENT > reported a width of 1 instead of 0 appears to be fixed, but two other bugs > that the m4/wcwidth.m4 test looks for (HEBREW POINT SHEVA and ZERO WIDTH > SPACE reporting widths of 1 instead of 0) appear to still be current.
Interesting. On the off chance it might have been clang I checked with: gl_cv_func_wcwidth_works=no CC=clang ./configure --quiet and still got fast results with uc_width() on glibc. Now the gnulib replacement is only table lookup and some bit manipulation. Ah it also calls locale_charset()! That must be slow on OSX. Indeed :( https://lists.gnu.org/archive/html/bug-gnulib/2015-01/msg00040.html https://lists.gnu.org/archive/html/bug-gnulib/2015-02/msg00000.html I see some recent improvement (which the latest coreutils git should reference): https://lists.gnu.org/archive/html/bug-gnulib/2018-04/msg00057.html It still would be nice to get appropriate caching here. cheers, Pádraig
