On Mon, 19 Nov 2007 14:46:50 +0100, "A. Pagaltzis" <pagalt...@gmx.de> wrote:
* Michael G Schwern <schw...@pobox.com> [2007-11-19 10:25]:
> A. Pagaltzis wrote:
> > Reminds me, this is not the only GNU tool that needs such
> > treatment. GNU grep pays attention to the locale as well, but
> > its encoding decoder is apparently written in Visual Basic --
> > if you use a UTF-8 locale, it will slow down by TWO ORDERS OF
> > MAGNITUDE.
> >
> > $ time LC_CTYPE=en_US.utf8 grep -cq tes /usr/share/dict/words
> >
> > real 0m0.686s
> > user 0m0.680s
> > sys 0m0.004s
> >
> > $ time LC_CTYPE=C grep -cq tes /usr/share/dict/words
> >
> > real 0m0.006s
> > user 0m0.004s
> > sys 0m0.000s
>
> Are you sure you didn't just measure disk caching? I don't any
> different results between the two on OS X.
Those measurements were with hot cache and are reliably
reproducible on my machine.
Possibly you need to set more locale variables; I also have LANG
set. (The "funny" thing is I had LC_COLLATE set to `C` already,
so grep should not be doing any decoding *anyway*.)
Or your GNU utils have been compiled with other switches. Or
something.
Yet another reason to make ––disable–nls default for such basic tools
(don't paste that option, it might contain UTF8 :p)
--
H.Merijn Brand Amsterdam Perl Mongers (http://amsterdam.pm.org/)
using & porting perl 5.6.2, 5.8.x, 5.10.x on HP-UX 10.20, 11.00, 11.11,
& 11.23, SuSE 10.1 & 10.2, AIX 5.2, and Cygwin. http://qa.perl.org
http://mirrors.develooper.com/hpux/ http://www.test-smoke.org
http://www.goldmark.org/jeff/stupid-disclaimers/