[
https://issues.apache.org/jira/browse/PDFBOX-5025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cody Wayne Holmes updated PDFBOX-5025:
--------------------------------------
Attachment: issue9418.pdf
> BaseParser fails when a number is followed by a string starting with 'e'
> ------------------------------------------------------------------------
>
> Key: PDFBOX-5025
> URL: https://issues.apache.org/jira/browse/PDFBOX-5025
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 2.0.21
> Reporter: Cody Wayne Holmes
> Priority: Major
> Attachments: issue9418.pdf
>
>
> I have found an issue in the latest version of PDFBox where parsing fails in
> the BaseParser when `parseDirObject` parses a number and the following string
> starts with an 'e'.
>
> This is due to the attempt to include numbers stored in scientific notation
> and the number being followed by the endobject keyword. These are invalid
> pdfs that don't contain a new line after the number before the 'endobject'
> keyword, but the failure can be prevented.
>
> I have found one way that seems to resolve this problem is by checking if
> the last character in the read number string is an e or E. If it is then
> removing it from the read string and unreading it from the source allows
> parsing to complete as expected.
>
> {code:java}
> private COSNumber parseCOSNumber() throws IOException
> {
> ... // Remove last character if it is not a number
> char lastc = buf.charAt(buf.length() - 1);
> if (lastc == 'e' || lastc == 'E')
> {
> buf.deleteCharAt(buf.length() - 1);
> seqSource.unread(lastc);
> }
> return COSNumber.get(buf.toString());
> }
> {code}
>
> An example of this error can be seen in PDF.js issue3323.
> [https://github.com/mozilla/pdf.js/blob/4ba28de2608866dcb10d627d77dc19ff3d017c17/test/pdfs/issue3323.pdf]
> Some more pdfs were attached as well with the issue.
>
>
> [https://github.com/mozilla/pdf.js/commit/26f5b1b2d37c7b74a073dee75d66fcc04fae10e8]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]