[jira] [Commented] (PDFBOX-4915) "Page tree root must be a dictionary" on PDDocument.load

Michael Klink (Jira) Wed, 15 Jul 2020 02:21:21 -0700


    [ 
https://issues.apache.org/jira/browse/PDFBOX-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17158011#comment-17158011
 ]


Michael Klink commented on PDFBOX-4915:
---------------------------------------

There are some errors in the cross reference table of your PDF:

* It has multiple entries like this:
{noformat}
0000000000 00000 n
{noformat}
  These entries claim that the corresponding object can be found at the start 
(offset 0) of the file. But there, after two comment lines, actually is the 
object 1 which is not the expected object. Thus, these pointers are incorrect.
  If these entries are intended to mean something like "unused" or {{null}}, an 
{{... f}} entry should have been used.
* It has only a single cross reference table, no incremental updates, but it 
already contains generation 1 objects, e.g. object 1:
{noformat}
xref
0 1712
0000000000 65535 f 
0000000015 00001 n  
{noformat}
  This is invalid, in the first document revision there may only be generation 
0 objects:
{panel:title=ISO 32000-1, section 7.5.4 "Cross-Reference Table"}
Except for object number 0, all objects in the cross-reference table shall 
initially have generation numbers of 0.
{panel}
  This object 1 in generation 1 actually is the page tree root. Probably PDFBox 
has problems with this invalid generation.

> "Page tree root must be a dictionary" on PDDocument.load
> --------------------------------------------------------
>
>                 Key: PDFBOX-4915
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4915
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.19
>            Reporter: Gauthier Roebroeck
>            Assignee: Andreas Lehmkühler
>            Priority: Minor
>         Attachments: Black Bullet - Volume 01 - Those Who Would Be Gods [Yen 
> Press][Kobo_Kitzoku].pdf, Screenshot 2020-07-14 at 20.19.40.png
>
>
> Hi,
> i have a PDF file that throws the following exception:
> {{java.io.IOException: Page tree root must be a 
> dictionaryjava.io.IOException: Page tree root must be a dictionary at 
> org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:198) 
> ~[pdfbox-2.0.19.jar:2.0.19] at 
> org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226) 
> ~[pdfbox-2.0.19.jar:2.0.19] at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1222) 
> ~[pdfbox-2.0.19.jar:2.0.19] at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1122) 
> ~[pdfbox-2.0.19.jar:2.0.19]}}
> This happens when loading the document from an InputStream.
> The document can be opened properly using Preview on Mac.
>  
> I have checked the PDF structure (even though i don't know it very well), 
> from what i can see it could be because the /Pages is not the first element 
> under the /Root.
>  
> !Screenshot 2020-07-14 at 20.19.40.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-4915) "Page tree root must be a dictionary" on PDDocument.load

Reply via email to