[ https://issues.apache.org/jira/browse/PDFBOX-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180310#comment-13180310 ]
Timo Boehme commented on PDFBOX-1202: ------------------------------------- reading (displaying) vs. text extraction: For an encrypted document one can define certain flags defining what is allowed to do with this document beside presenting document to the user; one is "Copying or otherwise extracting text and graphics from the document..." (PDF spec. 1.7, p.121). There is no mechanism for holding off a program to ignore these settings ones it has decrypted the document but the spec. states: "It is up to the implementors of PDF consumer applications to respect the intent of the document creator by restricting user access to an encrypted PDF file according to the permissions contained in the file." Within PDFBox you can find this test e.g. in o.a.p.ExtractText. If the test is removed the text content will be extracted without errors. In your case it seems that not all objects got decrypted and therefore the stream parsing failed. Since ExtractText works you might use another/own extraction method which accesses other objects. So it might be an error of PDFBox not decrypting or not using a decrypted specific stream or it depends on the kind of access in your routine. For further investigation it would be necessary to know the object number of the problematic stream or to have the complete routine you use for reading the document. > org.apache.pdfbox.filter.FlateFilter decode SEVERE: Stop reading corrupt > stream > ------------------------------------------------------------------------------- > > Key: PDFBOX-1202 > URL: https://issues.apache.org/jira/browse/PDFBOX-1202 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 1.6.0 > Reporter: Ilija Pavlic > Priority: Minor > Attachments: IATAUnitedStates.pdf > > > Error "org.apache.pdfbox.filter.FlateFilter decode SEVERE: Stop reading > corrupt stream" thrown when extracting text. > The document was loaded with the following snippet: > document = PDDocument.load("C:/Users/ilija.pavlic/Downloads/TestInput.pdf"); > if (document.isEncrypted()) { > try { > document.decrypt(""); > } catch (InvalidPasswordException e) { > System.err.println("Error: Document is encrypted with a password."); > System.exit(1); > } > } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira