Re: Enhancement request to wc

Pádraig Brady Sun, 08 Feb 2009 14:41:33 -0800

Jim Meyering wrote:
> Neo Anderson <[email protected]> wrote:
>> I understand that, so just to ask if it is possible to add a new
>> option e.g.. -utf8 so that wc can count word which is wild characters.
> 
> Do you already have an algorithmic definition of "word"
> that makes sense for the locale(s) you care about?
> If so, does it generalize to any other locales?


Looked very quickly into this.
I don't think there is an algorithm for this.
For languages like Thai, Lao, chinese or Japanese
a dictionary lookup is required to determine words!

http://www.unicode.org/reports/tr29/#Word_Boundaries
http://lists.apple.com/archives/Carbon-dev/2006/Apr/msg00692.html

Coincidentally I noticed Bruno checked some word boundary
stuff into gnulib today:
http://lists.gnu.org/archive/html/bug-gnulib/2009-02/msg00068.html

cheers,
Pádraig.


_______________________________________________
Bug-coreutils mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/bug-coreutils

Re: Enhancement request to wc

Reply via email to