[ https://issues.apache.org/jira/browse/PDFBOX-4800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17064937#comment-17064937 ]
Andreas Lehmkühler commented on PDFBOX-4800: -------------------------------------------- [~cryptomathic_epe] that piece of code doesn't parse all kind of numbers, it is limited to object numbers and offsets which are positive numbers. As long as I didn't miss anything we are safe here > Parsing of numbers does not always terminate at actual end of number > -------------------------------------------------------------------- > > Key: PDFBOX-4800 > URL: https://issues.apache.org/jira/browse/PDFBOX-4800 > Project: PDFBox > Issue Type: Bug > Components: Parsing > Affects Versions: 2.0.12, 2.0.15, 2.0.19 > Reporter: Eckhart Pedersen > Assignee: Andreas Lehmkühler > Priority: Major > Fix For: 2.0.20, 3.0.0 PDFBox > > Attachments: 1584634522723.txt, demobank_case_error_doc1.pdf, > demobank_case_ok_doc1.pdf > > > *Short description:* > The method *readStringNumber* in *BaseParser.java* reads more characters > than desired when parsing numbers in certain documents. We have internally > fixed the issue by adding the following line ({color:#de350b}marked with > red{color}): > {color:#505f79}while( (lastByte = seqSource.read() ) != _ASCII_SPACE_ ** > &&{color} > {color:#505f79} lastByte != _ASCII_LF_ ** &&{color} > {color:#505f79} lastByte != _ASCII_CR_ ** &&{color} > {color:#505f79} lastByte != 60 && _//see sourceforge bug > 1714707_{color} > {color:#505f79} __ lastByte != '[' && _// > PDFBOX-1845_{color} > {color:#505f79} __ lastByte != '(' && _// > PDFBOX-2579_{color} > {color:#505f79} __ lastByte != 0 && _//See > sourceforge bug 853328_{color} > {color:#de350b} __ *lastByte != '/' &&*{color} > {color:#505f79} lastByte != -1 ){color} > {color:#505f79} {{color} > *Background:* > Our customer ran into an issue with certain documents that were converted to > PDF/A2 format with Qoppa jPDFPreflight > ([https://www.qoppa.com/pdfpreflight/]). In some instances pdfbox would > afterwards fail to open the document. > (It is possible that the Qoppa conversion tool does something wrong and that > the resulting PDF is invalid somehow, but all other tools seem to open the > converted documents without any problems. We are not PDF experts, so this is > difficult for us to judge. If you determine that the problematic PDF document > is incorrect somehow, please notify us so that we can create a bug report at > Qoppa also.) > I am attaching both an original version of the document (which pdfbox can > open just fine) and the converted version (which pdfbox cannot parse > correctly). > *Additional information* > **My colleague refers to ISO 32000-1 section 7.2.2 which describes all valid > white-space and delimiter characters for PDF. > According to the list of delimiter/white-space characters the following > characters should also be handled in the readStringNumber method: '%','\{', > ')', ']', '}', '>' , FORM FEED, and HORIZONTAL TAB. > Though again, as we are not experts on the PDF standard we recommend that you > check the mentioned standard documents yourself and determine what kind of > solution you want to implement (if any). > *Final Note:* > We are filing this bug report in the hope that you find it helpful. I have > tried to include all relevant information as well as I can, if you have > further questions, I would be happy to address them as well as I can. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org