[ https://issues.apache.org/jira/browse/PDFBOX-5025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler closed PDFBOX-5025. -------------------------------------- > BaseParser fails when a number is followed by a string starting with 'e' > ------------------------------------------------------------------------ > > Key: PDFBOX-5025 > URL: https://issues.apache.org/jira/browse/PDFBOX-5025 > Project: PDFBox > Issue Type: Bug > Components: Parsing > Affects Versions: 2.0.21, 2.0.32, 3.0.3 PDFBox > Reporter: Cody Wayne Holmes > Assignee: Tilman Hausherr > Priority: Major > Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0 > > Attachments: issue2931.pdf, issue3323.pdf > > > I have found an issue in the latest version of PDFBox where parsing fails in > the BaseParser when `parseDirObject` parses a number and the following string > starts with an 'e'. > > This is due to the attempt to include numbers stored in scientific notation > and the number being followed by the endobject keyword. These are invalid > pdfs that don't contain a new line after the number before the 'endobject' > keyword, but the failure can be prevented. > > I have found one way that seems to resolve this problem is by checking if > the last character in the read number string is an e or E. If it is then > removing it from the read string and unreading it from the source allows > parsing to complete as expected. > > {code:java} > private COSNumber parseCOSNumber() throws IOException > { > ... > // Remove last character if it is not a number > char lastc = buf.charAt(buf.length() - 1); > if (lastc == 'e' || lastc == 'E') > { > buf.deleteCharAt(buf.length() - 1); > seqSource.unread(lastc); > } > return COSNumber.get(buf.toString()); > } > {code} > > An example of this error can be seen in PDF.js issue3323. > [https://github.com/mozilla/pdf.js/blob/4ba28de2608866dcb10d627d77dc19ff3d017c17/test/pdfs/issue3323.pdf] > Some more pdfs were attached as well with the issue. > > > [https://github.com/mozilla/pdf.js/commit/26f5b1b2d37c7b74a073dee75d66fcc04fae10e8] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org