New Word Document text extractor released

Ryan Ackley Wed, 03 Mar 2004 22:08:04 -0800

Version 0.4 of the TextMining.org text extraction library has been released!


I have finally gotten around to releasing a new version of the
textmining.org text extractor. This is a pure java library for extracting
text from Word 6.0/97/2000/XP/2003.

Some highlights from this release:

-I removed support for PDF documents. I was only wrapping the excellent
PDFBox (http://www.pdfbox.org) library with a few lines of code.
-I added support for Word 6.0 documents.
-The extractor will no longer extract text that has been deleted but is
still in the document because of revision tracking
-I added two exceptions, PasswordProtectedException and FastSavedException,
for more graceful failures.
-Fixed bugs
-Updated the license to Apache 2.0

A special thanks to BeeText Inc. (http://www.beetext.com)  They are a
software company that is on the cutting edge of software development for
translation professionals. Besides that, they sponsored all of the above
changes. Remember...support companies that support open source!

-Ryan Ackley


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

New Word Document text extractor released

Reply via email to