excel for indexing the actual content

Jack Krupansky Sun, 27 Jan 2013 11:49:50 -0800

Re-read my last message - and then take a look at that Solr source code,which will give you an idea how to use Tika, even though you are usingLucene only. If you have specific questions, please be specific.

To answer your latest question, yes, Tika is good enough. Solr/update/extract uses it, and Solr is based on Lucene.


-- Jack Krupansky

-----Original Message-----From: saisantoshi

Sent: Sunday, January 27, 2013 2:09 PM
To: [email protected]

Subject: Re: Readers for extracting textual info from pd/doc/excel forindexing the actual content


We are not using Solr and using just Lucene core 4.0 engine. I am trying to
see if we can use tika library to extract textual information from
pdf/word/excel documents. I am mainly interested in reading the contents
inside the documents and index using lucene. My question here is , is tika
framework good enough or is there any other better library. Any
issues/experiences in using the tika framework.

Thanks,
Sai.



--

View this message in context:http://lucene.472066.n3.nabble.com/Readers-for-extracting-textual-info-from-pd-doc-excel-for-indexing-the-actual-content-tp4036379p4036557.html

Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]

For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Readers for extracting textual info from pd/doc/excel for indexing the actual content

Reply via email to