[ 
https://issues.apache.org/jira/browse/PDFBOX-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180310#comment-13180310
 ] 

Timo Boehme commented on PDFBOX-1202:
-------------------------------------

reading (displaying) vs. text extraction:
For an encrypted document one can define certain flags defining what is allowed 
to do with this document beside presenting document to the user; one is 
"Copying or otherwise extracting text and graphics from the document..." (PDF 
spec. 1.7, p.121). There is no mechanism for holding off a program to ignore 
these settings ones it has decrypted the document but the spec. states: "It is 
up to the implementors of PDF consumer applications to respect the intent of 
the document creator by restricting user access to an encrypted PDF file 
according to the permissions contained in the file."
Within PDFBox you can find this test e.g. in o.a.p.ExtractText. If the test is 
removed the text content will be extracted without errors.

In your case it seems that not all objects got decrypted and therefore the 
stream parsing failed. Since ExtractText works you might use another/own 
extraction method which accesses other objects. So it might be an error of 
PDFBox not decrypting or not using a decrypted specific stream or it depends on 
the kind of access in your routine.

For further investigation it would be necessary to know the object number of 
the problematic stream or to have the complete routine you use for reading the 
document.

                
> org.apache.pdfbox.filter.FlateFilter decode SEVERE: Stop reading corrupt 
> stream
> -------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1202
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1202
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.6.0
>            Reporter: Ilija Pavlic
>            Priority: Minor
>         Attachments: IATAUnitedStates.pdf
>
>
> Error "org.apache.pdfbox.filter.FlateFilter decode SEVERE: Stop reading 
> corrupt stream" thrown when extracting text.
> The document was loaded with the following snippet:
> document = PDDocument.load("C:/Users/ilija.pavlic/Downloads/TestInput.pdf");
>     if (document.isEncrypted()) {
>         try {
>           document.decrypt("");
>       } catch (InvalidPasswordException e) {
>           System.err.println("Error: Document is encrypted with a password.");
>           System.exit(1);
>       }
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to