Hello,

Below is a diff to modify 'wc' in order to fix wc's concept of a "word" to match
the definition of "word" used by egrep(1), regex(7), regcomp(3), perl, and other
tools. Specifically, the definitition of a "word" is a sequence of alphanumeric or
underscore characters.

The wc tool, however, considers anything other than \n, \r, \t, \l, \v, and <space>
to be valid word-characters. For example, the text,
"The-rain+in(Spain)falls*mainly{on}the......plain", is only one word, according to
`wc'. The tools mentioned above will report nine words.

Good news! The fix is rediculously simple -- only a few lines! Here is the output
of cvs diff on the file, "textutils-XXX/src/wc.c" :

diff -r1.1.1.1 -r1.2
223c223,224
<             switch (*p++)
---
>             int c = *p++;
>             switch (c)
250c251,254
<                 in_word = 1;
---
>                 if( isalnum((c)) || c == '_' )
>                   in_word = 1;
>                 else
>                   goto word_separator;

-John Millaway


__________________________________________________
Do You Yahoo!?
Yahoo! Auctions - buy the things you want at great prices
http://auctions.yahoo.com/

_______________________________________________
Bug-textutils mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/bug-textutils

Reply via email to