Mojca Miklavec wrote:
> Hello,
>
> I would like to ask how difficult it would be to count the number of
> words in a TeX/ConTeXt document. If it's too complex, please ignore
> the rest of the message.
>
the way i do such things (and worse trickery) is using pdftotext
you can of course use tex, but then ther ecan be generated words and so and it
is insane to use tex (or adapt a tex style) for that; it may help to run with
(nondestructive)
\setupalign[nothyphenated]
anyhow, here is a script (i could not locate my normal one)
=== wordcount.rb ===
if (file = ARGV[0]) && file && FileTest.file?(file) then
begin
system("pdftotext #{ARGV[0]} wc.log")
data = IO.read("wc.log")
data.gsub!(/\d[\.\:]*\w+/o) do ' ' end # remove suffixes
data.gsub!(/\d/o) do ' ' end # remove numbers
data.gsub!(/\-\s+/mo) do ' ' end # remove hyphenation
data.gsub!(/\-/mo) do ' ' end # split compound words
data.gsub!(/[\.\,\<\>\/\?\\\|\'\"\;\:\]\{\}\{\+\=\-\_\)\(\*\&[EMAIL
PROTECTED]/mo) do ' ' end
words = data.split(/\s+/)
count = Hash.new
words.each do |w|
count[w] = (count[w] || 0) + 1
end
rescue
puts("some error #{$!}")
else
puts("words : #{words.size}")
puts("unique : #{count.size}")
end
if ARGV[1] =~ /list/ then
puts("\n")
count.sort.each do |k,v|
puts("#{k} : #{v}")
end
end
end
usage: wc filename.pdf [list]
it this kind of stuff is usefull, we can add it to one of the scripts that come
with context
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
| www.pragma-pod.nl
-----------------------------------------------------------------
_______________________________________________
ntg-context mailing list
[email protected]
http://www.ntg.nl/mailman/listinfo/ntg-context