Re: Extracting Complete Text from PDF using Lucene and JPEDAL!!!!

Mikael Sï¿½derman Mon, 14 Oct 2002 03:11:00 -0700

Hi Vin!

With JPedal you process one page at a time by calling the method decodePage
and supply the number of the page you want to process as argument.


In the example ExtractTextObjects the total number of pages is hard-coded to
1 (the variable end is set to 1 in the constructor), try to set the number
of pages by using the getPageCount method instead.

Best regards

Mikael Sï¿½derman

PS. Don't forget to always call flushObjectValues when done with a page.
This will make JPedal reuse memory.


----- Original Message -----
From: "Vinod Bhagat" <[EMAIL PROTECTED]>
To: "'Lucene Users List'" <[EMAIL PROTECTED]>
Sent: Monday, October 14, 2002 11:26 AM
Subject: Extracting Complete Text from PDF using Lucene and JPEDAL!!!!


> Dear People
>
>   I am using Lucene and one of the requirement is to index PDF. I am using
> JPEDAL's  API to extract text from PDF.  Till now i manage to get the text
> of the first page, I am using the ExtractTextObject.java class to do the
> above. But i want to extract the complete text of the PDF file. Have
anyone
> done this and possible could guide me towards it.
>
>  Appritiate for your positive and quick reply.
>
>  Cheers
> Vin.
>
> --
> To unsubscribe, e-mail:
<mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>
>
>



--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Re: Extracting Complete Text from PDF using Lucene and JPEDAL!!!!

Reply via email to