Re: Text extraction tool for Microsoft Office 2007

Michael McCandless Sun, 22 Feb 2009 02:21:39 -0800

Also, that chapter (7) has been rewritten in the revised Lucene inAction (available through Manning's early access now); it's now basedentirely on Tika.

But, note that Tika only just recently is able to extract text fromOffice 2007 (I think):


    https://issues.apache.org/jira/browse/TIKA-152

you'll have to build off of trunk or use the SNAPSHOT from Maven.

Mike

Otis Gospodnetic wrote:


Hi,

POI - http://poi.apache.org/
or
Tika (it uses POI) - http://lucene.apache.org/tika

And you can use code from Lucene in Action to index the text withLucene - http://manning.com/hatcher2 . The code is free to download.



Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----

From: "Zhang, Lisheng" <[email protected]>
To: [email protected]
Sent: Sunday, February 22, 2009 2:27:06 PM
Subject: Text extraction tool for Microsoft Office 2007

Hi,

What is the best tool (free software) to extract text from
Microsoft Office 2007:

Word 2007, Excel 2007, Power Point 2007

so that we can index them by lucene?

Thanks very much for helps, Lisheng

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Text extraction tool for Microsoft Office 2007

Reply via email to