On Mon, 2 Feb 2015 10:20:15 +0100
Keith Schultz <keithjschu...@icloud.com> wrote:

> Hello All,
> 
> As a linguist, I can say that not counting words that are shorter is
> an absolute NO-GO for an accurate word count and thereby character
> count!
> 
> See below, for a non representative proof !
> 
> > Am 01.02.2015 um 22:12 schrieb Wolfgang Schuster
> > <schuster.wolfg...@gmail.com>:
> > 
> [snip, snip]
> 
> > ConTeXt has an option to count the words (you find the result in
> > <jobname>.words) in a document but words words shorter than four
> > letters aren’t taken into account.
> word length under 4 characters  :   10
> word length =< 4 chars                 :   20
> 
> here you are missing a third of the words! That is 30%
> 
> regards
>       Keith



See also:
Zipf, G. K. (1949), "Human Behavior and the Principle of Least Effort",
Cambridge, MA: Addison-Wesley.

in particular, Chapter 2: On the Economy of Words.


As well as:
Shannon, C. E. (1951), "The redundancy of English", Cybernetics,
248-272.

54% for English, so we can afford to be sloppy (wch s wy txt compr qte
ll).


Alan
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

Reply via email to