Johan van der Knijff created PDFBOX-1674:
--------------------------------------------

             Summary: Preflight doesn't correctly parse PDF if obj identifier 
not followed by line terminator
                 Key: PDFBOX-1674
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1674
             Project: PDFBox
          Issue Type: Bug
          Components: Preflight
    Affects Versions: 2.0.0
         Environment: Win 7
            Reporter: Johan van der Knijff
            Priority: Minor
             Fix For: 2.0.0


For some test files on the Adobe Acrobat Engineering website, Preflight output 
looks like this:

<preflight name="Disney-Flash.pdf">
  <executionTimeMS>210</executionTimeMS>
  <isValid type="">false</isValid>
  <errors count="3">
    <error count="1">
      <code>1.0</code>
      <details>Syntax error, Expected pattern 'obj but missed at character 
'o'</details>
    </error>
    <error count="1">
      <code>1.2.1</code>
      <details>Body Syntax error, Expected pattern 'obj but missed at character 
'o'</details>
    </error>
    <error count="1">
      <code>1.2.1</code>
      <details>Body Syntax error, Single space expected</details>
    </error>
  </errors>
</preflight>

Which suggests that Preflight doesn't correctly parse the objects. This is 
confirmed by a look at some of the offending PDFs in a hex editor, which 
reveals that the object identifiers in them are not terminated by any EOL 
character(s). AFAIK this is allowed in both PDF and PDF/A-1. More details + 
links to test files here ('Multimedia' table and below):

http://www.openplanetsfoundation.org/blogs/2013-07-25-identification-pdf-preservation-risks-sequel


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to