In article <[EMAIL PROTECTED]> you write:
>the gnu awk folks are doing a pretty good job, given their constraints.

Thanks!  I try, I really do.

>i have not read the sed code (for a while, anyway), but i could imagine
>that it may have the same character set problems as newer versions of gnu grep.
>gnu grep calls mbtowc for each input character, even when not required.
>
>have you tried your test with LC_LANG=C?

Make that LC_ALL=C and you'll be on track. (FWIW, the CVS grep is much
better than the released version; they've been working on this problem.)

And yes, the locale stuff is a *N*I*G*H*T*M*A*R*E*.  Much of the heavy
lifting was done by others for the dfa and regex code, but I've done
my share to get it working too, and I must admit it's often a PITA.

Almost always the differences in behavior from LC_ALL=C to LC_ALL=xxx.UTF-8
are due to the locale definitions, not to gawk's handling of UTF
characters.  That all happens in the (GNU) library, below the level
where I can do anything about it.

OTOH, when I get fan mail from people in China and other such places
who are able to *get their work done* using gawk, it makes things
much more worthwhile.

And, to completely change the subject, if anyone on this list wants to
hire a telecommuter who would LOVE to finally make the jump to Plan 9
without looking back, please drop me a line...

Arnold
-- 
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd.     arnold AT skeeve DOT com
P.O. Box 354            Home Phone: +972  8 979-0381    Fax: +1 206 350 8765
Nof Ayalon              Cell Phone: +972 50  729-7545
D.N. Shimshon 99785     ISRAEL

Reply via email to