[ 
https://issues.apache.org/jira/browse/PDFBOX-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5156:
------------------------------------
    Fix Version/s: 2.0.24

> Error in identification of PDF comment symbol % as a token separator with PDF 
> names
> -----------------------------------------------------------------------------------
>
>                 Key: PDFBOX-5156
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5156
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.23, 3.0.0 PDFBox
>            Reporter: Peter Wyatt
>            Assignee: Tilman Hausherr
>            Priority: Major
>             Fix For: 2.0.24, 3.0.0 PDFBox
>
>
> The DARPA-funded SafeDocs research program has developed a Compacted PDF 
> Syntax text case to stress-test PDF lexical analyzers/parsers. See 
> [https://github.com/pdf-association/safedocs/tree/main/CompactedSyntax]. The 
> output of this test PDF was examined in detail using the PDFBOX debugger 
> "view internal structure" feature for both the body and content stream and 
> this is the only error... so well done! 
> PDFBOX 3.0.0-RC1 was tested using this highly targeted test PDF and there is 
> an error in the lexical analysis (token separators) between PDF name objects 
> and PDF comments. As specified in ISO 32000-2:
>  * clause 7.2.3: "The delimiter characters (, ), <, >, [, ], /, and % are 
> special (LEFT PARENTHESIS (28h), RIGHT PARENTHESIS (29h), LESS-THAN SIGN 
> (3Ch), GREATER-THAN SIGN (3Eh), LEFT SQUARE BRACKET (5Bh), RIGHT SQUARE 
> BRACKET (5Dh), SOLIDUS (2Fh) and PERCENT SIGN (25h), respectively). They 
> delimit syntactic entities such as arrays, names, and comments. ... Any of 
> these delimiters terminates the entity preceding it and is not included in 
> the entity."
>  * clause 7.2.4 "Any occurrence of the PERCENT SIGN (25h) outside a string or 
> inside a content stream (see 7.8.2, "Content streams") introduces a comment."
> Offset 3561 (as reported in the output below) is in the middle of this 
> fragment of PDF: {{<</Root 1 0 R/Info%comment after name}}
> Note also that other/earlier versions of PDFBOX were not tested.
> {{java -jar pdfbox-app-3.0.0-RC1.jar debug 
> safedocs\CompactedSyntax\CompactedPDFSyntaxTest.pdf}}
> {{Apr. 08, 2021 9:41:24 AM org.apache.pdfbox.pdfparser.BaseParser 
> parseDirObject}}
> {{WARNING: Skipped unexpected dir object = 'after' at offset 3561}}
> {{Apr. 08, 2021 9:41:24 AM org.apache.pdfbox.pdfparser.BaseParser 
> parseCOSDictionaryNameValuePair}}
> {{WARNING: Bad dictionary declaration at offset 3562}}
> {{Apr. 08, 2021 9:41:24 AM org.apache.pdfbox.pdfparser.BaseParser 
> parseCOSDictionary}}
> {{WARNING: Invalid dictionary, found: 'n' but expected: '/' at offset 3562}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to