Well, as I'm developing LanguageTool, I know too well that this is not so trivial - as we painfully found out :( You would have to traverse the whole document (including tables and footnotes) to find the text - we didn't find a proper way to do it and were happy to abandon this as soon as new API was available. Actually exporting to text document and running a script would be an easier option from the developer's point of view. Or even using a standalone Java program that parses ODF as XML file.

If you have to create frequency lists very frequently, then maybe it could make some sense to create such an extension that you describe. What would be the use of the frequency list? I simply cannot see a realistic usage scenario for non-scripting environment.

Regards
Marcin

Harold Fuchs pisze:
Thanks but it's not exactly what I had in mind. As far as I know extensions to OOo can be written in Java which, again as far as I know, can handle the associative array you used in your awk example. So, for someone familiar with the ODF structure and API, writing such an extension should be quite simple. Or ???

In addition, OOo can already produce a word *count* so it knows what a "word" is ...

Harold Fuchs
London, England
Please reply *only* to [email protected]



On 06/01/2009 02:18, Marcin Miłkowski wrote:
Save as text file, and run this awk script on it from command line (gawk -f <scriptfile> <filename.txt>):

----------
 # Print list of word frequencies
     {
         for (i = 1; i <= NF; i++)
             freq[$0]++
     }

     END {
         for (word in freq)
             printf "%s\t%d\n", word, freq[word]
     }

--------------

To get better results you could remove all punctuation by simple search and replace before saving as a text file. An extension would be easy to write but a nightmare in a language without hash tables as used in awk.

Best
Marcin


Harold Fuchs pisze:
Is there an extension (or other software) that will produce a word frequency table in Writer (2.4.1 or 3.x)? Where, please? Note: I do not mean a word count but a list of the number of times each word is used in a document.



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]






---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]




---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to