[ 
https://issues.apache.org/jira/browse/PDFBOX-5025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler closed PDFBOX-5025.
--------------------------------------

> BaseParser fails when a number is followed by a string starting with 'e'
> ------------------------------------------------------------------------
>
>                 Key: PDFBOX-5025
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5025
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.21, 2.0.32, 3.0.3 PDFBox
>            Reporter: Cody Wayne Holmes
>            Assignee: Tilman Hausherr
>            Priority: Major
>             Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0
>
>         Attachments: issue2931.pdf, issue3323.pdf
>
>
> I have found an issue in the latest version of PDFBox where parsing fails in 
> the BaseParser when `parseDirObject` parses a number and the following string 
> starts with an 'e'.
>   
>  This is due to the attempt to include numbers stored in scientific notation 
> and the number being followed by the endobject keyword. These are invalid 
> pdfs that don't contain a new line after the number before the 'endobject' 
> keyword, but the failure can be prevented.
>   
>  I have found one way that seems to resolve this problem is by checking if 
> the last character in the read number string is an e or E. If it is then 
> removing it from the read string and unreading it from the source allows 
> parsing to complete as expected.
>   
> {code:java}
> private COSNumber parseCOSNumber() throws IOException
> {
>   ...
>   // Remove last character if it is not a number
>   char lastc = buf.charAt(buf.length() - 1);
>   if (lastc == 'e' || lastc == 'E')
>   { 
>     buf.deleteCharAt(buf.length() - 1);
>     seqSource.unread(lastc);
>   }
>   return COSNumber.get(buf.toString());
> }
> {code}
>   
>  An example of this error can be seen in PDF.js issue3323.
> [https://github.com/mozilla/pdf.js/blob/4ba28de2608866dcb10d627d77dc19ff3d017c17/test/pdfs/issue3323.pdf]
> Some more pdfs were attached as well with the issue.
>   
>  
> [https://github.com/mozilla/pdf.js/commit/26f5b1b2d37c7b74a073dee75d66fcc04fae10e8]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to