[
https://issues.apache.org/jira/browse/PDFBOX-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Barrett updated PDFBOX-1515:
--------------------------------
Attachment: The right to take risks.pdf
Exception can be reproduced using this pdf file, with straightforward code
along the lines of:
parser.parse();
cosDoc = parser.getDocument();
pdfStripper = new PDFTextStripper();
pdDoc = new PDDocument(cosDoc);
parsedText = pdfStripper.getText(pdDoc);
exception occurs after getText call.
> PDGraphicsState class receives null page argument leading to
> nullpointerexception
> ---------------------------------------------------------------------------------
>
> Key: PDFBOX-1515
> URL: https://issues.apache.org/jira/browse/PDFBOX-1515
> Project: PDFBox
> Issue Type: Bug
> Components: PDModel, Utilities
> Affects Versions: 1.7.1
> Environment: all (os-x, ubuntu linux, win-32, win64)
> Reporter: Tim Barrett
> Priority: Critical
> Attachments: The right to take risks.pdf
>
>
> workaround changes needed for PDGraphicsState constructor as reproduced below:
> public PDGraphicsState(PDRectangle page) {
> /*
> * TB - changes made here are a workaround which creates a
> default
> * GeneralPath assigned to currentClippingPath if the
> constructor
> * argument page is null. Probably a better remedy would be to
> ensure
> * that the page argument is not null or use a dedicated
> constructor if
> * page is null
> */
> if (page != null) {
> Dimension dimension = page.createDimension();
> Rectangle rectangle = new Rectangle(dimension);
> currentClippingPath = new GeneralPath(rectangle);
> currentClippingPath = new GeneralPath(new
> Rectangle(page.createDimension()));
> if (page.getLowerLeftX() != 0 || page.getLowerLeftY()
> != 0) {
> // Compensate for offset
> this.currentTransformationMatrix =
> this.currentTransformationMatrix.multiply(Matrix.getTranslatingInstance(-page.getLowerLeftX(),
> -page.getLowerLeftY()));
> }
> } else {
> currentClippingPath = new GeneralPath();
> }
> }
> Also, as a side effect of above workaround, made following change within
> PDFStreamEngine.processEncodedText:
> /*
> * TB - needed to make change here, as we encounter here a
> knock on
> * effect of allowing null page arguments through in
> PDGraphicsState
> * constructor which creates a default GeneralPath assigned to
> * currentClippingPath. That workaround causes findMediaBox to
> return
> * null, so in that case we assign default values to pageHeight
> and
> * pageWidth here. Everything else seems to work as far as text
> * extraction is concerned.
> */
> if (page.findMediaBox() != null) {
> pageHeight = page.findMediaBox().getHeight();
> pageWidth = page.findMediaBox().getWidth();
> } else {
> pageHeight = 0;
> pageWidth = 0;
> }
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira