Pinky Iyer wrote:

Hi !
  I am trying to use xpdf for pdf parser, the problem i encounter is when i encounter 
a file with .pdf extension, i call the pdftotext script to convert to text, which in 
turn uses the file system and leaves the same file with .txt extension in same dir. 
How can i get this as a stream and not use the file system at all. Also How do i 
access the summary and title info.


xpdf has an option to turn the PDF into an HTML instead of txt, which allows you to use an HTMLParser
for populating the fields.


Concerning the extension: when you create your Lucene document, you could replace the txt extension
by the pdf extension in the case of the "uri" field.


HTH

Michael

Anybody who has done this before, please help!
Thanks!
Pinky Iyer




---------------------------------
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, and more





--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to