On Mon, 19 Nov 2007 14:46:50 +0100, "A. Pagaltzis" <pagalt...@gmx.de> wrote:

* Michael G Schwern <schw...@pobox.com> [2007-11-19 10:25]:
> A. Pagaltzis wrote:
> > Reminds me, this is not the only GNU tool that needs such
> > treatment. GNU grep pays attention to the locale as well, but
> > its encoding decoder is apparently written in Visual Basic --
> > if you use a UTF-8 locale, it will slow down by TWO ORDERS OF
> > MAGNITUDE.
> > > > $ time LC_CTYPE=en_US.utf8 grep -cq tes /usr/share/dict/words > > > > real 0m0.686s
> >     user    0m0.680s
> >     sys     0m0.004s
> > > > $ time LC_CTYPE=C grep -cq tes /usr/share/dict/words > > > > real 0m0.006s
> >     user    0m0.004s
> >     sys     0m0.000s
> > Are you sure you didn't just measure disk caching? I don't any
> different results between the two on OS X.

Those measurements were with hot cache and are reliably
reproducible on my machine.

Possibly you need to set more locale variables; I also have LANG
set. (The "funny" thing is I had LC_COLLATE set to `C` already,
so grep should not be doing any decoding *anyway*.)

Or your GNU utils have been compiled with other switches. Or
something.

Yet another reason to make ––disable–nls default for such basic tools
(don't paste that option, it might contain UTF8 :p)

--
H.Merijn Brand         Amsterdam Perl Mongers (http://amsterdam.pm.org/)
using & porting perl 5.6.2, 5.8.x, 5.10.x  on HP-UX 10.20, 11.00, 11.11,
& 11.23, SuSE 10.1 & 10.2, AIX 5.2, and Cygwin.       http://qa.perl.org
http://mirrors.develooper.com/hpux/            http://www.test-smoke.org
                       http://www.goldmark.org/jeff/stupid-disclaimers/

Reply via email to