On Sat, 5 Aug 2006, Mojca Miklavec wrote: > I would like to ask how difficult it would be to count the number of > words in a TeX/ConTeXt document. If it's too complex, please ignore > the rest of the message. > > > Most recipes for LaTeX say that it's best to do something like > "pdftotext" and then issue "wc" to count the words in the resulting > text file, but windows users don't have "wc" and sometimes you only > need to know the length of the abstract or so ... > > Some time ago Hans mentioned that he counts the number of appearance > of single charactres, but I don't know how difficult it would be to > extend it to count the number of words. > > The problem is not that well defined (how to handle equations, some > would probably want to exclude headers, footers, buttons, ...), but it > only needs to be an approximation and "backward compatibility" (in the > sense that counter would have to result in the same number after some > years) is not needed at all since algorithms might improve with time > and the resulting document doesn't really depend on that number, it > would only be written to the log file. > > My idea for the interface would be something like > > \startwordcount[abstract] > \startframedtext > Bla bla. > \stopframedtext > \stopwordcount > > which would write something like "abstract: 2 words" to the log file > > or > > \startstatistics[abstract][words] > \startframedtext > Bla bla. > \stopframedtext > \stopstatistics > > But this is really a low priority. I'm currently using Acrobat to copy > the text, then I paste it into Office and take a look at statistics > there when I need to obey some limitations. > > So, if there's a simple solution, I would be glad to use it, but if it > takes too much time to implement it, it's probably not worth the > effort.
A very crude approach. There is a program called detex http://ctan.org/tex-archive/support/detex/ I have not used it, but I think that it strips off every command \something from the tex file. Then you can filter the file through wc to get a rough estimate of the number of words. One approach that will work is \startstatistics[filename][words|letters|lines] maps to \startbuffer[\jobname-statistics-filename] and \stopstatistics maps to \stopbuffer \getbuffer[\jobname-statistics-filename] \executesystemcommand{detex \jobname-statistics-filename.tmp | wc <flags correspondingto words|lines|letters> } and possibly prettify output to be more clearly visible in the log. Another approach can be write a vim script so that you can count the number of words in a visually highlighted area. Aditya _______________________________________________ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context