my experiences - Re: Parsing Word Docs

David Spencer Wed, 05 Mar 2003 15:23:51 -0800

FYI I tried the textmining.org/poi combo and on a collection of 350 word
docs people have developed here over the years, and it failed on 33% of them
with exceptions being thrown about the formats being invalid.

I tried "antiword" ( http://www.winfield.demon.nl/ ), a native & free *.exe, and it worked great ( well it seemed to process all the files fine).

I've had similar experiences with PDF - I tried the 3 or so freeware/java PDF text extractors and they were not as good as the exe, pdftotext, from foolabs (http://www.foolabs.com/xpdf/).

Not satisfying to a java developer but these work better than anything else I can find.

You get source and I use them on windows & linux, no prob.

Eric Anderson wrote:

I'm interested in using the textmining/textextraction utilities using Apache POI, that Ryan was discussing. However, I'm having some difficulty determining what the insertion point would be to replace the default parser with the word parser.

Any assistance would be appreciated.
LanRx Network Solutions, Inc.
Providing Enterprise Level Solutions...On A Small Business Budget
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

my experiences - Re: Parsing Word Docs

Reply via email to