I'm working with the PdfTextExtractor to extract text from a specific region of 
a PDF document using the code below.

PdfReader reader = new PdfReader(strPdfIn);

int iPages = reader.getNumberOfPages();

Rectangle rect = new Rectangle(10,10,30,360);

RenderFilter filter = new RegionTextRenderFilter(rect);

TextExtractionStrategy strategy;

strategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), 
filter);



for (int i=1; i<=iPages; i++)

{

System.out.println("Getting Page:[" + i + "] of [" + iPages + "]");

String strExtract = PdfTextExtractor.getTextFromPage(reader, i, strategy);


System.out.println("strExtract:[" + strExtract + "]");

}


The source PDF is 31,576 pages.  Everything works fine until I reach page 
26,402 and it throws the following exception.


ExceptionConverter: 
Completed...com.itextpdf.text.exceptions.InvalidPdfException: '>' not expected 
at file pointer 191980

at com.itextpdf.text.pdf.PRTokeniser.throwError(PRTokeniser.java:205)

at com.itextpdf.text.pdf.PRTokeniser.nextToken(PRTokeniser.java:358)

at 
com.itextpdf.text.pdf.PdfContentParser.nextValidToken(PdfContentParser.java:196)

at 
com.itextpdf.text.pdf.PdfContentParser.readPRObject(PdfContentParser.java:166)

at com.itextpdf.text.pdf.PdfContentParser.parse(PdfContentParser.java:89)

at 
com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.processContent(PdfContentStreamProcessor.java:365)

at 
com.itextpdf.text.pdf.parser.PdfReaderContentParser.processContent(PdfReaderContentParser.java:79)

at 
com.itextpdf.text.pdf.parser.PdfTextExtractor.getTextFromPage(PdfTextExtractor.java:73)

I isolated the above error to a specific page and on this page it consisted of 
a TIFF image with no visible text.  The document opens fine in Acrobat with no 
errors.  I extracted the page that generates the exception using Acrobat Pro X 
and the problem is still present.  In another PDF document, there is text with 
a TIFF image and it fails with the exact same error.

It is my understanding that the PDF was generated by HP Exstream.

Any suggestions would be greatly appreciated.

Bill
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to