Andreas Lehmkühler closed PDFBOX-4049.
    Resolution: Duplicate

The garbage at the beginning of the PDF is the root cause for the issue. PDFBox 
realizes that all offsets are bad and triggers a brute force search. As the pdf 
uses compressed and encrypted streams the brute force mechanism isn't able to 
repair the pdf, see PDFBOX-4097.

You should ensure that such garbage is removed before parsing the file to avoid 
the repair mechanism.

Closed as duplicate od PDFBOX-4097

> IllegalArgumentException: root cannot be null
> ---------------------------------------------
>                 Key: PDFBOX-4049
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4049
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.8
>         Environment: Windows 10
>            Reporter: savan patel
>            Assignee: Andreas Lehmkühler
>            Priority: Major
>              Labels: regression
>         Attachments: 372d5dd7-d4b8-41b2-9f50-80c1353aee59.pdf
> I got a pdf,,, in which pdfbox gives errors while parsing it.
> {code}
> Exception in thread "main" java.lang.IllegalArgumentException: root cannot be 
> null
>         at org.apache.pdfbox.pdmodel.PDPageTree.<init>(PDPageTree.java:75)
>         at 
> org.apache.pdfbox.pdmodel.PDDocumentCatalog.getPages(PDDocumentCatalog.java:129)
>         at 
> org.apache.pdfbox.pdmodel.PDDocument.getNumberOfPages(PDDocument.java:1411)
> {code}
> This did not happen with 2.0.7.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to