[
https://issues.apache.org/jira/browse/PDFBOX-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated PDFBOX-5156:
------------------------------------
Fix Version/s: 2.0.24
> Error in identification of PDF comment symbol % as a token separator with PDF
> names
> -----------------------------------------------------------------------------------
>
> Key: PDFBOX-5156
> URL: https://issues.apache.org/jira/browse/PDFBOX-5156
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 2.0.23, 3.0.0 PDFBox
> Reporter: Peter Wyatt
> Assignee: Tilman Hausherr
> Priority: Major
> Fix For: 2.0.24, 3.0.0 PDFBox
>
>
> The DARPA-funded SafeDocs research program has developed a Compacted PDF
> Syntax text case to stress-test PDF lexical analyzers/parsers. See
> [https://github.com/pdf-association/safedocs/tree/main/CompactedSyntax]. The
> output of this test PDF was examined in detail using the PDFBOX debugger
> "view internal structure" feature for both the body and content stream and
> this is the only error... so well done!
> PDFBOX 3.0.0-RC1 was tested using this highly targeted test PDF and there is
> an error in the lexical analysis (token separators) between PDF name objects
> and PDF comments. As specified in ISO 32000-2:
> * clause 7.2.3: "The delimiter characters (, ), <, >, [, ], /, and % are
> special (LEFT PARENTHESIS (28h), RIGHT PARENTHESIS (29h), LESS-THAN SIGN
> (3Ch), GREATER-THAN SIGN (3Eh), LEFT SQUARE BRACKET (5Bh), RIGHT SQUARE
> BRACKET (5Dh), SOLIDUS (2Fh) and PERCENT SIGN (25h), respectively). They
> delimit syntactic entities such as arrays, names, and comments. ... Any of
> these delimiters terminates the entity preceding it and is not included in
> the entity."
> * clause 7.2.4 "Any occurrence of the PERCENT SIGN (25h) outside a string or
> inside a content stream (see 7.8.2, "Content streams") introduces a comment."
> Offset 3561 (as reported in the output below) is in the middle of this
> fragment of PDF: {{<</Root 1 0 R/Info%comment after name}}
> Note also that other/earlier versions of PDFBOX were not tested.
> {{java -jar pdfbox-app-3.0.0-RC1.jar debug
> safedocs\CompactedSyntax\CompactedPDFSyntaxTest.pdf}}
> {{Apr. 08, 2021 9:41:24 AM org.apache.pdfbox.pdfparser.BaseParser
> parseDirObject}}
> {{WARNING: Skipped unexpected dir object = 'after' at offset 3561}}
> {{Apr. 08, 2021 9:41:24 AM org.apache.pdfbox.pdfparser.BaseParser
> parseCOSDictionaryNameValuePair}}
> {{WARNING: Bad dictionary declaration at offset 3562}}
> {{Apr. 08, 2021 9:41:24 AM org.apache.pdfbox.pdfparser.BaseParser
> parseCOSDictionary}}
> {{WARNING: Invalid dictionary, found: 'n' but expected: '/' at offset 3562}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]