Hi,

Here it is:

http://www.seekafile.org/ 

-----Original Message-----
From: ashwin kumar [mailto:[EMAIL PROTECTED] 
Sent: 08 March 2007 13:07
To: java-user@lucene.apache.org
Subject: Re: indexing pdfs

hi again
do we have to download any jar files to run this program if so can u
give me the link pls

ashwin

On 3/8/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote:
>
> Well you don't need to actually save the text to disk and then index 
> the saved index file, you can directly index that text in-memory.
>
> The only other way I have heard of is to use Ifilters.  I believe 
> SeekAFile does indexing of pdfs.
>
> Sachin
>
> -----Original Message-----
> From: ashwin kumar [mailto:[EMAIL PROTECTED]
> Sent: 08 March 2007 11:35
> To: java-user@lucene.apache.org
> Subject: Re: indexing pdfs
>
> Is the only way index pdfs is to convert it into a text and then only 
> index it ???
>
>
>
> On 3/8/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote:
> >
> > Hi Aswin,
> >
> > You can try pdfbox to convert the pdf documents to text and then use

> > Lucene to index the text.  The code for turning a pdf to text is 
> > very
> > simple:
> >
> > private static string parseUsingPDFBox(string filename)
> >         {
> >             // document reader
> >             PDDocument doc = PDDocument.load(filename);
> >             // create stripper (wish I had the power to do that - 
> > wouldn't leave the house)
> >             PDFTextStripper stripper = new PDFTextStripper();
> >             // get text from doc using stripper
> >             return stripper.getText(doc);
> >         }
> >
> > Sachin
> >
> > -----Original Message-----
> > From: ashwin kumar [mailto:[EMAIL PROTECTED]
> > Sent: 08 March 2007 09:37
> > To: java-user@lucene.apache.org
> > Subject: indexing pdfs
> >
> > hi can some one help me by giving any sample programs for indexing 
> > pdfs and .doc files
> >
> > thanks
> > regards
> > ashwin
> >
> >
> > This message has been scanned for viruses by MailControl - (see
> > http://bluepages.wsatkins.co.uk/?6875772)
> >
> >
> > This email and any attached files are confidential and copyright 
> > protected. If you are not the addressee, any dissemination of this 
> > communication is strictly prohibited. Unless otherwise expressly 
> > agreed in writing, nothing stated in this communication shall be
> legally binding.
> >
> > The ultimate parent company of the Atkins Group is WS Atkins plc.
> > Registered in England No. 1885586.  Registered Office Woodcote 
> > Grove, Ashley Road, Epsom, Surrey KT18 5BW.
> >
> > Consider the environment. Please don't print this e-mail unless you 
> > really need to.
> >
> > --------------------------------------------------------------------
> > - To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to