[jira] [Resolved] (PDFBOX-1467) PDocumentCatalog.getAllPages returns empty list for certain pdfs, affects many other methods as well

Jeremias Maerki (JIRA) Thu, 28 Feb 2013 08:41:14 -0800

     [ 
https://issues.apache.org/jira/browse/PDFBOX-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jeremias Maerki resolved PDFBOX-1467.
-------------------------------------

    Resolution: Not A Problem

I've just stumbled over the same problem with a different PDF. What the two 
have in common: they are encrypted. So I assume that you didn't decrypt the 
document. After adding the following code right after loading the PDF, it works 
now for me:

{code}
            String password = "";
            if (doc.isEncrypted()) {
                try {
                    doc.decrypt(password);
                } catch (InvalidPasswordException e) {
                    System.err.println("Invalid password for encrypted 
document.");
                } catch (CryptographyException e) {
                    throw new IOException("Error decrypting PDF document: " + 
e.getMessage(), e);
                }
            }
{code}
                
> PDocumentCatalog.getAllPages returns empty list for certain pdfs, affects 
> many other methods as well
> ----------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1467
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1467
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.7.1
>            Reporter: Peter Lehto
>            Priority: Critical
>         Attachments: cd17ac7f-675c-4cc8-859b-5bd9d509cb1a.pdf
>
>
> Originally found from PageExtractor and after some debugging, it seems that 
> PDocumentCatalot.getAllPages returns an empty list for certain pdfs. Also 
> calling PDDocument.getNumberOfPages returns 0 as it uses the catalog for 
> getting the actual information. This goes all the way down to 
> COSDictionary.getDictionaryObject, which returns null for COSName.PAGES.
> Eventually everything that has something do with page numbers fails. For 
> example saving document to stream etc.
> This problems occurs with certain pdf documents. I suspect they have some 
> kind of different structure or header information or possibly even corrupted 
> header. With other pdf files this problem does not exist. The non working pdf 
> files are still accessible through other software like Adobe Reader and they 
> work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (PDFBOX-1467) PDocumentCatalog.getAllPages returns empty list for certain pdfs, affects many other methods as well

Reply via email to