Hi Aswin, You can try pdfbox to convert the pdf documents to text and then use Lucene to index the text. The code for turning a pdf to text is very simple:
private static string parseUsingPDFBox(string filename) { // document reader PDDocument doc = PDDocument.load(filename); // create stripper (wish I had the power to do that - wouldn't leave the house) PDFTextStripper stripper = new PDFTextStripper(); // get text from doc using stripper return stripper.getText(doc); } Sachin -----Original Message----- From: ashwin kumar [mailto:[EMAIL PROTECTED] Sent: 08 March 2007 09:37 To: java-user@lucene.apache.org Subject: indexing pdfs hi can some one help me by giving any sample programs for indexing pdfs and .doc files thanks regards ashwin This message has been scanned for viruses by MailControl - (see http://bluepages.wsatkins.co.uk/?6875772) This email and any attached files are confidential and copyright protected. If you are not the addressee, any dissemination of this communication is strictly prohibited. Unless otherwise expressly agreed in writing, nothing stated in this communication shall be legally binding. The ultimate parent company of the Atkins Group is WS Atkins plc. Registered in England No. 1885586. Registered Office Woodcote Grove, Ashley Road, Epsom, Surrey KT18 5BW. Consider the environment. Please don't print this e-mail unless you really need to. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]