Re: Indexing other documents (.pdf et .doc)

Michael Wechner Thu, 19 Dec 2002 12:28:14 -0800

Friaa Nafaa wrote:

 Hello,I use Lucene with Tomcat and I can now index and search all html documents. But I would like to index other documents such us pdf or Word (.doc), I hope that sameone can help me !

Concerning PDF:

Before indexing you should extract the text from the PDF and save it
as .txt (Then you can index the .txt, but reference the PDF uri). To do this have a look at

http://www.foolabs.com/xpdf/download.html

or

http://www.pdfbox.org/

These links are listed at

http://jakarta.apache.org/lucene/docs/contributions.html

Also take a look at the FAQ

HTH

Michael

_______________________________________________
Join Excite! - http://www.excite.com
The most personalized portal on the Web!



--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Re: Indexing other documents (.pdf et .doc)

Reply via email to