Jim Meyering wrote: > Neo Anderson <[email protected]> wrote: >> I understand that, so just to ask if it is possible to add a new >> option e.g.. -utf8 so that wc can count word which is wild characters. > > Do you already have an algorithmic definition of "word" > that makes sense for the locale(s) you care about? > If so, does it generalize to any other locales?
Looked very quickly into this. I don't think there is an algorithm for this. For languages like Thai, Lao, chinese or Japanese a dictionary lookup is required to determine words! http://www.unicode.org/reports/tr29/#Word_Boundaries http://lists.apple.com/archives/Carbon-dev/2006/Apr/msg00692.html Coincidentally I noticed Bruno checked some word boundary stuff into gnulib today: http://lists.gnu.org/archive/html/bug-gnulib/2009-02/msg00068.html cheers, Pádraig. _______________________________________________ Bug-coreutils mailing list [email protected] http://lists.gnu.org/mailman/listinfo/bug-coreutils
