[jira] [Commented] (PDFBOX-4800) Parsing of numbers does not always terminate at actual end of number

Jira Mon, 23 Mar 2020 09:45:08 -0700


    [ 
https://issues.apache.org/jira/browse/PDFBOX-4800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17064937#comment-17064937
 ]


Andreas Lehmkühler commented on PDFBOX-4800:
--------------------------------------------

[~cryptomathic_epe] that piece of code doesn't parse all kind of numbers, it is 
limited to object numbers and offsets which are positive numbers. As long as I 
didn't miss anything we are safe here

> Parsing of numbers does not always terminate at actual end of number
> --------------------------------------------------------------------
>
>                 Key: PDFBOX-4800
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4800
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.12, 2.0.15, 2.0.19
>            Reporter: Eckhart Pedersen
>            Assignee: Andreas Lehmkühler
>            Priority: Major
>             Fix For: 2.0.20, 3.0.0 PDFBox
>
>         Attachments: 1584634522723.txt, demobank_case_error_doc1.pdf, 
> demobank_case_ok_doc1.pdf
>
>
> *Short description:*
>  The method *readStringNumber* in *BaseParser.java* reads more characters 
> than desired when parsing numbers in certain documents. We have internally 
> fixed the issue by adding the following line ({color:#de350b}marked with 
> red{color}):
> {color:#505f79}while( (lastByte = seqSource.read() ) != _ASCII_SPACE_ ** 
> &&{color}
>  {color:#505f79}             lastByte != _ASCII_LF_ ** &&{color}
>  {color:#505f79}             lastByte != _ASCII_CR_ ** &&{color}
>  {color:#505f79}             lastByte != 60 && _//see sourceforge bug 
> 1714707_{color}
>  {color:#505f79}             __             lastByte != '[' && _// 
> PDFBOX-1845_{color}
>  {color:#505f79}             __             lastByte != '(' && _// 
> PDFBOX-2579_{color}
>  {color:#505f79}             __             lastByte != 0 && _//See 
> sourceforge bug 853328_{color}
>  {color:#de350b}             __             *lastByte != '/' &&*{color}
>  {color:#505f79}        lastByte != -1 ){color}
>  {color:#505f79}     {{color}
> *Background:*
>  Our customer ran into an issue with certain documents that were converted to 
> PDF/A2 format with Qoppa jPDFPreflight 
> ([https://www.qoppa.com/pdfpreflight/]). In some instances pdfbox would 
> afterwards fail to open the document.
> (It is possible that the Qoppa conversion tool does something wrong and that 
> the resulting PDF is invalid somehow, but all other tools seem to open the 
> converted documents without any problems. We are not PDF experts, so this is 
> difficult for us to judge. If you determine that the problematic PDF document 
> is incorrect somehow, please notify us so that we can create a bug report at 
> Qoppa also.)
> I am attaching both an original version of the document (which pdfbox can 
> open just fine) and the converted version (which pdfbox cannot parse 
> correctly).
> *Additional information*
> **My colleague refers to ISO 32000-1 section 7.2.2 which describes all valid 
> white-space and delimiter characters for PDF.
> According to the list of delimiter/white-space characters the following 
> characters should also be handled in the readStringNumber method: '%','\{', 
> ')', ']', '}', '>' , FORM FEED, and HORIZONTAL TAB.
> Though again, as we are not experts on the PDF standard we recommend that you 
> check the mentioned standard documents yourself and determine what kind of 
> solution you want to implement (if any).
> *Final Note:*
> We are filing this bug report in the hope that you find it helpful. I have 
> tried to include all relevant information as well as I can, if you have 
> further questions, I would be happy to address them as well as I can.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4800) Parsing of numbers does not always terminate at actual end of number

Reply via email to