[
https://issues.apache.org/jira/browse/PDFBOX-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17158011#comment-17158011
]
Michael Klink commented on PDFBOX-4915:
---------------------------------------
There are some errors in the cross reference table of your PDF:
* It has multiple entries like this:
{noformat}
0000000000 00000 n
{noformat}
These entries claim that the corresponding object can be found at the start
(offset 0) of the file. But there, after two comment lines, actually is the
object 1 which is not the expected object. Thus, these pointers are incorrect.
If these entries are intended to mean something like "unused" or {{null}}, an
{{... f}} entry should have been used.
* It has only a single cross reference table, no incremental updates, but it
already contains generation 1 objects, e.g. object 1:
{noformat}
xref
0 1712
0000000000 65535 f
0000000015 00001 n
{noformat}
This is invalid, in the first document revision there may only be generation
0 objects:
{panel:title=ISO 32000-1, section 7.5.4 "Cross-Reference Table"}
Except for object number 0, all objects in the cross-reference table shall
initially have generation numbers of 0.
{panel}
This object 1 in generation 1 actually is the page tree root. Probably PDFBox
has problems with this invalid generation.
> "Page tree root must be a dictionary" on PDDocument.load
> --------------------------------------------------------
>
> Key: PDFBOX-4915
> URL: https://issues.apache.org/jira/browse/PDFBOX-4915
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 2.0.19
> Reporter: Gauthier Roebroeck
> Assignee: Andreas Lehmkühler
> Priority: Minor
> Attachments: Black Bullet - Volume 01 - Those Who Would Be Gods [Yen
> Press][Kobo_Kitzoku].pdf, Screenshot 2020-07-14 at 20.19.40.png
>
>
> Hi,
> i have a PDF file that throws the following exception:
> {{java.io.IOException: Page tree root must be a
> dictionaryjava.io.IOException: Page tree root must be a dictionary at
> org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:198)
> ~[pdfbox-2.0.19.jar:2.0.19] at
> org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
> ~[pdfbox-2.0.19.jar:2.0.19] at
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1222)
> ~[pdfbox-2.0.19.jar:2.0.19] at
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1122)
> ~[pdfbox-2.0.19.jar:2.0.19]}}
> This happens when loading the document from an InputStream.
> The document can be opened properly using Preview on Mac.
>
> I have checked the PDF structure (even though i don't know it very well),
> from what i can see it could be because the /Pages is not the first element
> under the /Root.
>
> !Screenshot 2020-07-14 at 20.19.40.png!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]