[
https://issues.apache.org/jira/browse/PDFBOX-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219474#comment-14219474
]
Ekaterina commented on PDFBOX-2510:
-----------------------------------
I am using the non-sequential parser in tika with pdfbox-1.8.8-SNAPSHOT and now
it gives me:
org.apache.tika.exception.TikaException: Unable to extract PDF content
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:146)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:159)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:121)
at com.majio.core.gate.utils.DocUtils.readInputStream(DocUtils.java:59)
at com.majio.core.gate.utils.DocUtils.readDoc(DocUtils.java:30)
at com.majio.core.gate.utils.DocUtils.main(DocUtils.java:83)
Caused by: org.apache.pdfbox.exceptions.WrappedIOException
at
org.apache.pdfbox.pdmodel.encryption.SecurityHandler.encryptData(SecurityHandler.java:377)
at
org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decryptStream(SecurityHandler.java:475)
at
org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decrypt(SecurityHandler.java:439)
at
org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decryptObject(SecurityHandler.java:409)
at
org.apache.pdfbox.pdmodel.encryption.SecurityHandler.proceedDecryption(SecurityHandler.java:221)
at
org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.decryptDocument(StandardSecurityHandler.java:158)
at
org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1601)
at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:947)
at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:357)
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:130)
... 7 more
Caused by: javax.crypto.IllegalBlockSizeException: Input length must be
multiple of 16 when decrypting with padded cipher
at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:913)
at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:824)
at com.sun.crypto.provider.AESCipher.engineDoFinal(AESCipher.java:436)
at javax.crypto.Cipher.doFinal(Cipher.java:2179)
at
org.apache.pdfbox.pdmodel.encryption.SecurityHandler.encryptData(SecurityHandler.java:355)
... 16 more
> Getting "Error: The supplied password does not match either the owner or user
> password in the document." while trying to parse pdf without password in
> -------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: PDFBOX-2510
> URL: https://issues.apache.org/jira/browse/PDFBOX-2510
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 1.8.8
> Reporter: Ekaterina
> Attachments: DV.pdf
>
>
> I have a pdf that was correctly parsed for some time and suddenly I've got
> "javax.crypto.BadPaddingException: Given final block not properly padded"
> when I tried to parse it with pdfbox-1.8.7. Then I tried
> pdfbox-1.8.8-SNAPSHOT and I've got "Error: The supplied password does not
> match either the owner or user password in the document.". Here is the code
> I'm using:
> ContentHandler handler = new BodyContentHandler(400000);
> Metadata metadata = new Metadata();
> Parser parser = new AutoDetectParser();
> try (TikaInputStream stream = TikaInputStream.get(input)) {
> parser.parse(stream, handler, metadata, new
> ParseContext());
> } catch (IOException | SAXException | TikaException e) {
> LOG.error("Could not parse the input document", e);
> }
> return handler.toString();
> (I am using it with tika-parsers-1.6)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)