On 9/17/07, Douglas A. Gwyn <[EMAIL PROTECTED]> wrote:
> erik quanstrom wrote:
> > i think the devolution of gnu grep is quite instructive.  ...
> > it gets to the heart of why plan9's invention and use (thank's rob, ken) of
> > utf-8 is so great.
>
> If the problem is that Gnu grep converts any non-8-bit character set
> to wchar_t (the equivalent of Plan 9 "rune"), then it's not really a
> fair criticism of the software.  The conversion approach handles a
> wide variety of character encoding scheme, whereas grepping the
> encodings directly (the fast approach) doesn't work well for many
> non-UTF-8 encodings.

Well, on a 2GHz x86, gnu grep ran for me at about 9600 baud on an
ASCII file if I set my locale to the UTF-8 locale.  UTF-8 is ASCII
compatible - explicitly, publicly, and on purpose - so there is no
excuse for this sort of performance penalty.  To be specific, in
the UTF-8 locale it should take just a few instructions to convert
any character to wchar_t, ASCII or not, but gnu grep was calling
malloc for this, even for an ASCII byte.

It is a fair criticism to say this is unacceptable, whatever the
intentions of the authors may be.

-rob

Reply via email to