Alternatively, you can use PDFTextStream (http://snowtide.com).

It also has an easy-to-use Lucene API, with code that looks like this:

Document doc = PDFDocumentFactory.buildPDFDocument(pdfFile, config);
indexWriter.addDocument(doc);

One of the nice advantages of this is that the resulting Lucene Document instance can also contain the PDF document's metadata attributes as Lucene Fields.

Here's a longer tutorial that covers this and more:

http://snowtide.com/home/PDFTextStream/techtips/easy_lucene_integration

Chas Emerick
[EMAIL PROTECTED]
Snowtide Informatics Systems

PDFTextStream: fast PDF text extraction for Java apps and Lucene
http://snowtide.com/home/PDFTextStream/

On Nov 16, 2004, at 10:22 AM, Luke Shannon wrote:

www.pdfbox.org

Once you get the package installed the code you can use is:

Document doc = LucenePDFDocument.getDocument(file);
      writer.addDocument(doc);

This method returns the PDF in Lucene document format.

Luke

----- Original Message -----
From: "Miguel Angel" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, November 16, 2004 10:19 AM
Subject: how do you work with PDF


Hi, i need know  how do you work with PDF, please give the process.
Thanks...

--
Miguel Angel Angeles R.
Asesoria en Conectividad y Servidores
Telf. 97451277

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to