Alternatively, you can use PDFTextStream (http://snowtide.com).
It also has an easy-to-use Lucene API, with code that looks like this:
Document doc = PDFDocumentFactory.buildPDFDocument(pdfFile, config); indexWriter.addDocument(doc);
One of the nice advantages of this is that the resulting Lucene Document instance can also contain the PDF document's metadata attributes as Lucene Fields.
Here's a longer tutorial that covers this and more:
http://snowtide.com/home/PDFTextStream/techtips/easy_lucene_integration
Chas Emerick [EMAIL PROTECTED] Snowtide Informatics Systems
PDFTextStream: fast PDF text extraction for Java apps and Lucene http://snowtide.com/home/PDFTextStream/
On Nov 16, 2004, at 10:22 AM, Luke Shannon wrote:
www.pdfbox.org
Once you get the package installed the code you can use is:
Document doc = LucenePDFDocument.getDocument(file); writer.addDocument(doc);
This method returns the PDF in Lucene document format.
Luke
----- Original Message ----- From: "Miguel Angel" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, November 16, 2004 10:19 AM Subject: how do you work with PDF
Hi, i need know how do you work with PDF, please give the process. Thanks...
-- Miguel Angel Angeles R. Asesoria en Conectividad y Servidores Telf. 97451277
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]