Mojca Miklavec wrote:
> Hello,
>
> I would like to ask how difficult it would be to count the number of
> words in a TeX/ConTeXt document. If it's too complex, please ignore
> the rest of the message.
>   
the way i do such things (and worse trickery) is using pdftotext 

you can of course use tex, but then ther ecan be generated words and so and it 
is insane to use tex (or adapt a tex style) for that; it may help to run with 
(nondestructive) 

\setupalign[nothyphenated]

anyhow, here is a script (i could not locate my normal one) 

=== wordcount.rb ===

if (file = ARGV[0]) && file && FileTest.file?(file) then
    begin
        system("pdftotext #{ARGV[0]} wc.log")
        data = IO.read("wc.log")
        data.gsub!(/\d[\.\:]*\w+/o) do ' ' end  # remove suffixes
        data.gsub!(/\d/o)           do ' ' end  # remove numbers
        data.gsub!(/\-\s+/mo)       do ' ' end  # remove hyphenation
        data.gsub!(/\-/mo)          do ' ' end  # split compound words
        data.gsub!(/[\.\,\<\>\/\?\\\|\'\"\;\:\]\{\}\{\+\=\-\_\)\(\*\&[EMAIL 
PROTECTED]/mo) do ' ' end
        words = data.split(/\s+/)
        count = Hash.new
        words.each do |w|
            count[w] = (count[w] || 0) + 1
        end
    rescue
        puts("some error #{$!}")
    else
        puts("words  : #{words.size}")
        puts("unique : #{count.size}")
    end
    if ARGV[1] =~ /list/ then
        puts("\n")
        count.sort.each do |k,v|
            puts("#{k} : #{v}")
        end
    end
end


usage: wc filename.pdf [list] 

it this kind of stuff is usefull, we can add it to one of the scripts that come 
with context 

Hans 
-----------------------------------------------------------------
                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                             | www.pragma-pod.nl
-----------------------------------------------------------------

_______________________________________________
ntg-context mailing list
[email protected]
http://www.ntg.nl/mailman/listinfo/ntg-context

Reply via email to