Friaa Nafaa wrote:
Concerning PDF:Hello,I use Lucene with Tomcat and I can now index and search all html documents. But I would like to index other documents such us pdf or Word (.doc), I hope that sameone can help me !
Before indexing you should extract the text from the PDF and save it
as .txt (Then you can index the .txt, but reference the PDF uri). To do this have a look at
http://www.foolabs.com/xpdf/download.html
or
http://www.pdfbox.org/
These links are listed at
http://jakarta.apache.org/lucene/docs/contributions.html
Also take a look at the FAQ
HTH
Michael
_______________________________________________ Join Excite! - http://www.excite.com The most personalized portal on the Web!
-- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
