Johan van der Knijff created PDFBOX-1674:
--------------------------------------------
Summary: Preflight doesn't correctly parse PDF if obj identifier
not followed by line terminator
Key: PDFBOX-1674
URL: https://issues.apache.org/jira/browse/PDFBOX-1674
Project: PDFBox
Issue Type: Bug
Components: Preflight
Affects Versions: 2.0.0
Environment: Win 7
Reporter: Johan van der Knijff
Priority: Minor
Fix For: 2.0.0
For some test files on the Adobe Acrobat Engineering website, Preflight output
looks like this:
<preflight name="Disney-Flash.pdf">
<executionTimeMS>210</executionTimeMS>
<isValid type="">false</isValid>
<errors count="3">
<error count="1">
<code>1.0</code>
<details>Syntax error, Expected pattern 'obj but missed at character
'o'</details>
</error>
<error count="1">
<code>1.2.1</code>
<details>Body Syntax error, Expected pattern 'obj but missed at character
'o'</details>
</error>
<error count="1">
<code>1.2.1</code>
<details>Body Syntax error, Single space expected</details>
</error>
</errors>
</preflight>
Which suggests that Preflight doesn't correctly parse the objects. This is
confirmed by a look at some of the offending PDFs in a hex editor, which
reveals that the object identifiers in them are not terminated by any EOL
character(s). AFAIK this is allowed in both PDF and PDF/A-1. More details +
links to test files here ('Multimedia' table and below):
http://www.openplanetsfoundation.org/blogs/2013-07-25-identification-pdf-preservation-risks-sequel
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira