[l2h] extracting bag-of-words from latex

Hamilton Link Tue, 14 Feb 2006 14:54:06 -0800

Hi, I'm about to try to process a large set of LaTeX files. What Iwould like is to strip the files of equations, formatting, comments,etc. to produce a text file of "just the words," so to speak. As faras I can tell the ways of potentially doing this would be:


- compile the latex to ps or pdf and then run a word extractor on that
- run latex2rtf or latex2html and do word extraction from that

Does anyone on the list know of a better way, or have any suggestionsas to how I might proceed using latex2html as far as configurations orsettings that might ease the process etc.?

Please copy me on the response, I'm not subscribed to the latex2htmlmailing list.


thanks in advance,
hamilton

_______________________________________________
latex2html mailing list
[email protected]
http://tug.org/mailman/listinfo/latex2html

[l2h] extracting bag-of-words from latex

Reply via email to