Hi I am trying to read multiple pages from PDF , for that i changed the start and end parameter in the ExtractTextObjects class. But it gives the following erro aftter reading successfully the text from the first page.
======================ERROR======================================= Processing content from page 2 Reading resources object 2 0 R Reading fonts java.lang.NullPointerException at org.jpedal.fonts.PdfFontsData.putWidth(PdfFontsData.java:696) at org.jpedal.fonts.PdfFontsData.readAllTypeFont(PdfFontsData.java:447) at org.jpedal.PdfObjects.readFonts(PdfObjects.java:581) at org.jpedal.PdfObjects.readResources(PdfObjects.java:468) at org.jpedal.PdfDecoder.decodePage(PdfDecoder.java:176) at org.jpedal.examples.ExtractTextObjectsNEW.<init>(ExtractTextObjectsNE W.java:145) at org.jpedal.examples.ExtractTextObjectsNEW.main(ExtractTextObjectsNEW. java:259) Exception java.lang.NullPointerException reading font ============================================================= It reads the first page without any problem, but while it iterates for the subsequent pages it does not work and gives the NullPointer Exception. has anyone encountered something liek this,,, am i missing something. At the moment i ma hardCoding the start as start = 1 end =10 for the number of pages. But it gives the error. I tried to use the getPageCount() method declared in pdfDecoder.java , but this method returns 0 always as count. I am using the following code ::: //decode_pdf = new PdfDecoder( false ); //--------------------------Lines ADDED---------------------------------- decode_pdf = new PdfDecoder( true ); pageCount = decode_pdf.getPageCount(); if (pageCount > start) { end = pageCount; } System.out.println( "TOTAL PAGE COUNT IS =================== :" + pageCount ); //------------------------------------------------------------ /** * open the file (and read metadata including pages in file) */ System.out.println( "Opening NEW file :" + file_name ); decode_pdf.openPdfFile( file_name ); } catch( Exception e ) { System.err.println( "Exception " + e + " in pdf code" ); System.exit( 1 ); } I flush each page object at the end decode_pdf.flushObjectValues( true ); Will appritiate for your positive and quick reply. Best Regards. vin. -----Original Message----- From: Mikael Söderman [mailto:[EMAIL PROTECTED]] Sent: Monday, October 14, 2002 12:37 PM To: Lucene Users List Subject: Re: Extracting Complete Text from PDF using Lucene and JPEDAL!!!! Hi Vin! With JPedal you process one page at a time by calling the method decodePage and supply the number of the page you want to process as argument. In the example ExtractTextObjects the total number of pages is hard-coded to 1 (the variable end is set to 1 in the constructor), try to set the number of pages by using the getPageCount method instead. Best regards Mikael Söderman PS. Don't forget to always call flushObjectValues when done with a page. This will make JPedal reuse memory. ----- Original Message ----- From: "Vinod Bhagat" <[EMAIL PROTECTED]> To: "'Lucene Users List'" <[EMAIL PROTECTED]> Sent: Monday, October 14, 2002 11:26 AM Subject: Extracting Complete Text from PDF using Lucene and JPEDAL!!!! > Dear People > > I am using Lucene and one of the requirement is to index PDF. I am using > JPEDAL's API to extract text from PDF. Till now i manage to get the text > of the first page, I am using the ExtractTextObject.java class to do the > above. But i want to extract the complete text of the PDF file. Have anyone > done this and possible could guide me towards it. > > Appritiate for your positive and quick reply. > > Cheers > Vin. > > -- > To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> > > -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>