On Mon, 2 Feb 2015 10:20:15 +0100 Keith Schultz <keithjschu...@icloud.com> wrote:
> Hello All, > > As a linguist, I can say that not counting words that are shorter is > an absolute NO-GO for an accurate word count and thereby character > count! > > See below, for a non representative proof ! > > > Am 01.02.2015 um 22:12 schrieb Wolfgang Schuster > > <schuster.wolfg...@gmail.com>: > > > [snip, snip] > > > ConTeXt has an option to count the words (you find the result in > > <jobname>.words) in a document but words words shorter than four > > letters aren’t taken into account. > word length under 4 characters : 10 > word length =< 4 chars : 20 > > here you are missing a third of the words! That is 30% > > regards > Keith See also: Zipf, G. K. (1949), "Human Behavior and the Principle of Least Effort", Cambridge, MA: Addison-Wesley. in particular, Chapter 2: On the Economy of Words. As well as: Shannon, C. E. (1951), "The redundancy of English", Cybernetics, 248-272. 54% for English, so we can afford to be sloppy (wch s wy txt compr qte ll). Alan ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________