Hi, Here it is:
http://www.seekafile.org/ -----Original Message----- From: ashwin kumar [mailto:[EMAIL PROTECTED] Sent: 08 March 2007 13:07 To: java-user@lucene.apache.org Subject: Re: indexing pdfs hi again do we have to download any jar files to run this program if so can u give me the link pls ashwin On 3/8/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote: > > Well you don't need to actually save the text to disk and then index > the saved index file, you can directly index that text in-memory. > > The only other way I have heard of is to use Ifilters. I believe > SeekAFile does indexing of pdfs. > > Sachin > > -----Original Message----- > From: ashwin kumar [mailto:[EMAIL PROTECTED] > Sent: 08 March 2007 11:35 > To: java-user@lucene.apache.org > Subject: Re: indexing pdfs > > Is the only way index pdfs is to convert it into a text and then only > index it ??? > > > > On 3/8/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote: > > > > Hi Aswin, > > > > You can try pdfbox to convert the pdf documents to text and then use > > Lucene to index the text. The code for turning a pdf to text is > > very > > simple: > > > > private static string parseUsingPDFBox(string filename) > > { > > // document reader > > PDDocument doc = PDDocument.load(filename); > > // create stripper (wish I had the power to do that - > > wouldn't leave the house) > > PDFTextStripper stripper = new PDFTextStripper(); > > // get text from doc using stripper > > return stripper.getText(doc); > > } > > > > Sachin > > > > -----Original Message----- > > From: ashwin kumar [mailto:[EMAIL PROTECTED] > > Sent: 08 March 2007 09:37 > > To: java-user@lucene.apache.org > > Subject: indexing pdfs > > > > hi can some one help me by giving any sample programs for indexing > > pdfs and .doc files > > > > thanks > > regards > > ashwin > > > > > > This message has been scanned for viruses by MailControl - (see > > http://bluepages.wsatkins.co.uk/?6875772) > > > > > > This email and any attached files are confidential and copyright > > protected. If you are not the addressee, any dissemination of this > > communication is strictly prohibited. Unless otherwise expressly > > agreed in writing, nothing stated in this communication shall be > legally binding. > > > > The ultimate parent company of the Atkins Group is WS Atkins plc. > > Registered in England No. 1885586. Registered Office Woodcote > > Grove, Ashley Road, Epsom, Surrey KT18 5BW. > > > > Consider the environment. Please don't print this e-mail unless you > > really need to. > > > > -------------------------------------------------------------------- > > - To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]