On 9/17/07, Douglas A. Gwyn <[EMAIL PROTECTED]> wrote: > erik quanstrom wrote: > > i think the devolution of gnu grep is quite instructive. ... > > it gets to the heart of why plan9's invention and use (thank's rob, ken) of > > utf-8 is so great. > > If the problem is that Gnu grep converts any non-8-bit character set > to wchar_t (the equivalent of Plan 9 "rune"), then it's not really a > fair criticism of the software. The conversion approach handles a > wide variety of character encoding scheme, whereas grepping the > encodings directly (the fast approach) doesn't work well for many > non-UTF-8 encodings.
Well, on a 2GHz x86, gnu grep ran for me at about 9600 baud on an ASCII file if I set my locale to the UTF-8 locale. UTF-8 is ASCII compatible - explicitly, publicly, and on purpose - so there is no excuse for this sort of performance penalty. To be specific, in the UTF-8 locale it should take just a few instructions to convert any character to wchar_t, ASCII or not, but gnu grep was calling malloc for this, even for an ASCII byte. It is a fair criticism to say this is unacceptable, whatever the intentions of the authors may be. -rob
