Re: xpdf parser usage for lucene

Michael Wechner Tue, 25 Feb 2003 14:11:43 -0800

Pinky Iyer wrote:

Hi !
  I am trying to use xpdf for pdf parser, the problem i encounter is when i encounter 
a file with .pdf extension, i call the pdftotext script to convert to text, which in 
turn uses the file system and leaves the same file with .txt extension in same dir. 
How can i get this as a stream and not use the file system at all. Also How do i 
access the summary and title info.

xpdf has an option to turn the PDF into an HTML instead of txt, which allows you to use an HTMLParser for populating the fields.

Concerning the extension: when you create your Lucene document, you could replace the txt extension by the pdf extension in the case of the "uri" field.

HTH

Michael

Anybody who has done this before, please help! Thanks! Pinky Iyer

--------------------------------- Do you Yahoo!? Yahoo! Tax Center - forms, calculators, tips, and more


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: xpdf parser usage for lucene

Reply via email to