excel for indexing the actual content

VIGNESH S Mon, 28 Jan 2013 06:37:22 -0800

Apache Tika:-You can Use to Extract text from PDF,word Documents.

It internally uses Apache POI for Extraction of text from office documents..


It uses PDFBOX for Extraction of text from PDF Documents..



On Sat, Jan 26, 2013 at 4:24 AM, saisantoshi <[email protected]> wrote:
> I want to index the document content( such as PDF/word/excel) and am just
> wondering if there are any good readers that I can use to integrate into
> Lucene 4.0. Any pointers/example code is appreciated..
>
> Lucene In Action book mentions "tika" as the library to use but not sure if
> this is the preferred approach. Anyone who used this, if they could share
> some experiences in using this library is greatly appreciated.
>
> AN example of how a pdf document can be indexed using either the tika
> framework or any other reader is much appreciated.
>
> Thanks,
> Sai.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Readers-for-extracting-textual-info-from-pd-doc-excel-for-indexing-the-actual-content-tp4036379.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>



-- 
Thanks and Regards
Vignesh Srinivasan
9739135640

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Readers for extracting textual info from pd/doc/excel for indexing the actual content

Reply via email to