Well you don't need to actually save the text to disk and then index the saved index file, you can directly index that text in-memory.
The only other way I have heard of is to use Ifilters. I believe SeekAFile does indexing of pdfs. Sachin -----Original Message----- From: ashwin kumar [mailto:[EMAIL PROTECTED] Sent: 08 March 2007 11:35 To: java-user@lucene.apache.org Subject: Re: indexing pdfs Is the only way index pdfs is to convert it into a text and then only index it ??? On 3/8/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote: > > Hi Aswin, > > You can try pdfbox to convert the pdf documents to text and then use > Lucene to index the text. The code for turning a pdf to text is very > simple: > > private static string parseUsingPDFBox(string filename) > { > // document reader > PDDocument doc = PDDocument.load(filename); > // create stripper (wish I had the power to do that - > wouldn't leave the house) > PDFTextStripper stripper = new PDFTextStripper(); > // get text from doc using stripper > return stripper.getText(doc); > } > > Sachin > > -----Original Message----- > From: ashwin kumar [mailto:[EMAIL PROTECTED] > Sent: 08 March 2007 09:37 > To: java-user@lucene.apache.org > Subject: indexing pdfs > > hi can some one help me by giving any sample programs for indexing > pdfs and .doc files > > thanks > regards > ashwin > > > This message has been scanned for viruses by MailControl - (see > http://bluepages.wsatkins.co.uk/?6875772) > > > This email and any attached files are confidential and copyright > protected. If you are not the addressee, any dissemination of this > communication is strictly prohibited. Unless otherwise expressly > agreed in writing, nothing stated in this communication shall be legally binding. > > The ultimate parent company of the Atkins Group is WS Atkins plc. > Registered in England No. 1885586. Registered Office Woodcote Grove, > Ashley Road, Epsom, Surrey KT18 5BW. > > Consider the environment. Please don't print this e-mail unless you > really need to. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]