hi again
do we have to download any jar files to run this program if so can u give me
the link pls

ashwin

On 3/8/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote:

Well you don't need to actually save the text to disk and then index the
saved index file, you can directly index that text in-memory.

The only other way I have heard of is to use Ifilters.  I believe
SeekAFile does indexing of pdfs.

Sachin

-----Original Message-----
From: ashwin kumar [mailto:[EMAIL PROTECTED]
Sent: 08 March 2007 11:35
To: java-user@lucene.apache.org
Subject: Re: indexing pdfs

Is the only way index pdfs is to convert it into a text and then only
index it ???



On 3/8/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote:
>
> Hi Aswin,
>
> You can try pdfbox to convert the pdf documents to text and then use
> Lucene to index the text.  The code for turning a pdf to text is very
> simple:
>
> private static string parseUsingPDFBox(string filename)
>         {
>             // document reader
>             PDDocument doc = PDDocument.load(filename);
>             // create stripper (wish I had the power to do that -
> wouldn't leave the house)
>             PDFTextStripper stripper = new PDFTextStripper();
>             // get text from doc using stripper
>             return stripper.getText(doc);
>         }
>
> Sachin
>
> -----Original Message-----
> From: ashwin kumar [mailto:[EMAIL PROTECTED]
> Sent: 08 March 2007 09:37
> To: java-user@lucene.apache.org
> Subject: indexing pdfs
>
> hi can some one help me by giving any sample programs for indexing
> pdfs and .doc files
>
> thanks
> regards
> ashwin
>
>
> This message has been scanned for viruses by MailControl - (see
> http://bluepages.wsatkins.co.uk/?6875772)
>
>
> This email and any attached files are confidential and copyright
> protected. If you are not the addressee, any dissemination of this
> communication is strictly prohibited. Unless otherwise expressly
> agreed in writing, nothing stated in this communication shall be
legally binding.
>
> The ultimate parent company of the Atkins Group is WS Atkins plc.
> Registered in England No. 1885586.  Registered Office Woodcote Grove,
> Ashley Road, Epsom, Surrey KT18 5BW.
>
> Consider the environment. Please don't print this e-mail unless you
> really need to.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Reply via email to