Hi Aswin,
You can try pdfbox to convert the pdf documents to text and then use
Lucene to index the text. The code for turning a pdf to text is very
simple:
private static string parseUsingPDFBox(string filename)
{
// document reader
PDDocument doc = PDDocument.load(filename);
// create stripper (wish I had the power to do that -
wouldn't leave the house)
PDFTextStripper stripper = new PDFTextStripper();
// get text from doc using stripper
return stripper.getText(doc);
}
Sachin
-----Original Message-----
From: ashwin kumar [mailto:[EMAIL PROTECTED]
Sent: 08 March 2007 09:37
To: [email protected]
Subject: indexing pdfs
hi can some one help me by giving any sample programs for indexing pdfs
and .doc files
thanks
regards
ashwin
This message has been scanned for viruses by MailControl - (see
http://bluepages.wsatkins.co.uk/?6875772)
This email and any attached files are confidential and copyright protected. If
you are not the addressee, any dissemination of this communication is strictly
prohibited. Unless otherwise expressly agreed in writing, nothing stated in
this communication shall be legally binding.
The ultimate parent company of the Atkins Group is WS Atkins plc. Registered
in England No. 1885586. Registered Office Woodcote Grove, Ashley Road, Epsom,
Surrey KT18 5BW.
Consider the environment. Please don't print this e-mail unless you really need
to.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]