On Thu 10 Jan 2013 04:21:10 NZDT +1300, Keith McGavin wrote:

> tr removes singular dots which wc may pick up as words.
> 
> pdftotext file.pdf - | tr -d '.' | wc -w
> pdftotext file.pdf - | tr -d '.' | wc -l

That's only the start of it. Many PDFs are constructed in such a way
that the resulting plain text contains loads of spaces within words, and
hyphenated words would be counted as two. More accurate might be a pdf
viewer application (if it has a word count option), but that's not
command line.

Volker

-- 
Volker Kuhlmann
http://volker.dnsalias.net/     Please do not CC list postings to me.
_______________________________________________
Linux-users mailing list
[email protected]
http://lists.canterbury.ac.nz/mailman/listinfo/linux-users

Reply via email to