THis means that i have to use the htmlparser again on the converted document. Is that right? Also is there a way to use these without utilizing the filesystem, by way of streams or so. Michael Wechner <[EMAIL PROTECTED]> wrote:Pinky Iyer wrote:
>Hi ! > I am trying to use xpdf for pdf parser, the problem i encounter is when i encounter > a file with .pdf extension, i call the pdftotext script to convert to text, which in > turn uses the file system and leaves the same file with .txt extension in same dir. > How can i get this as a stream and not use the file system at all. Also How do i > access the summary and title info. > xpdf has an option to turn the PDF into an HTML instead of txt, which allows you to use an HTMLParser for populating the fields. Concerning the extension: when you create your Lucene document, you could replace the txt extension by the pdf extension in the case of the "uri" field. HTH Michael > Anybody who has done this before, please help! >Thanks! >Pinky Iyer > > > > >--------------------------------- >Do you Yahoo!? >Yahoo! Tax Center - forms, calculators, tips, and more > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------- Do you Yahoo!? Yahoo! Tax Center - forms, calculators, tips, and more