[ 
https://issues.apache.org/jira/browse/PDFBOX-858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934351#action_12934351
 ] 

Martijn Brinkers commented on PDFBOX-858:
-----------------------------------------

You should decrypt the document before parsing:

            if (document.isEncrypted())
            {
                try {
                    /*
                     * Try to decrypt with standard password
                     */
                    document.decrypt(null);
                }
                catch (CryptographyException e) {
                    // handle
                }
                catch (InvalidPasswordException e) {
                    // handle
                }
            }

The meta info seems to be encrypted with the default password. The meta info 
can be read by me when the PDF is decrypted.

> Metadata extraction broken on some PDF files
> --------------------------------------------
>
>                 Key: PDFBOX-858
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-858
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.2.1, 1.3.1
>            Reporter: Patrik Stenmark
>         Attachments: 2001Derivatives and Public Debt Mngt.pdf, 
> RethinkingTheFinancialNetwork.pdf
>
>
> On certain PDF files (examples attached), the metadata extraction seems to be 
> broken. Preview (on Mac OS X) and Acrobat Reader is able to read the 
> metadata, but PDFbox gives complete jibberish: 
> Author=è'ÿÆ??kÔ7??ÕªG?
> I've tried both the version included in Tika 0.7 (1.0.0 I believe) and 
> r1021264 from SVN. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to