[
https://issues.apache.org/jira/browse/TIKA-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304109#comment-17304109
]
Bertrand Caron edited comment on TIKA-3331 at 3/18/21, 12:34 PM:
-----------------------------------------------------------------
Thank you for your answer. I was probably influenced by JHOVE behavior, because
I noticed this error only with the GUI. When I checked the same files with the
command line, Tika did not return any error and is even able to retrieve text
and metadata despite the encryption.
So I guess this is just a bug with the GUI?
was (Author: bertrandcaron):
Thank you for your answer. I was probably influenced by JHOVE behavior, because
I noticed this error only with the GUI. When using Tika on the same files by
the command line, Tika does not return any error and is even able to retrieve
text and metadata despite the encryption.
So I guess this is just a bug with the GUI?
> Return a more informative error when trying to parse an encrypted file
> ----------------------------------------------------------------------
>
> Key: TIKA-3331
> URL: https://issues.apache.org/jira/browse/TIKA-3331
> Project: Tika
> Issue Type: Bug
> Affects Versions: 1.24.1
> Environment: See enclosed picture.
> Reporter: Bertrand Caron
> Priority: Minor
> Attachments: encrypte.odt, system.png
>
>
> When parsing a PDF or ODF encrypted file, Tika returns a long, cryptic error
> message. A more informative message would be useful for the user - at least
> mention the encryption, and perhaps the algorithm used?
>
> I enclose a fabricated example, but real-world examples can be found in a
> similar issue for the JHOVE tool:
> [https://github.com/openpreserve/jhove/issues/640]
>
> The error log obtained:
>
> Apache Tika was unable to parse the document
> at /home/bertrand/Téléchargements/Toponymic guidelines_Instituto geografico
> nacional_2011.pdf.
> The full exception stack trace is included below:
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from
> org.apache.tika.parser.pdf.PDFParser@5e7e878d
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:293)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:188)
> at org.apache.tika.parser.DigestingParser.parse(DigestingParser.java:84)
> at org.apache.tika.gui.TikaGUI.handleStream(TikaGUI.java:358)
> at org.apache.tika.gui.TikaGUI.openFile(TikaGUI.java:309)
> at org.apache.tika.gui.TikaGUI.actionPerformed(TikaGUI.java:267)
> at
> java.desktop/javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:1967)
> at
> java.desktop/javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2308)
> at
> java.desktop/javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:405)
> at
> java.desktop/javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:262)
> at
> java.desktop/javax.swing.AbstractButton.doClick(AbstractButton.java:369)
> at
> java.desktop/javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:1020)
> at
> java.desktop/javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:1064)
> at java.desktop/java.awt.Component.processMouseEvent(Component.java:6636)
> at
> java.desktop/javax.swing.JComponent.processMouseEvent(JComponent.java:3342)
> at java.desktop/java.awt.Component.processEvent(Component.java:6401)
> at java.desktop/java.awt.Container.processEvent(Container.java:2263)
> at java.desktop/java.awt.Component.dispatchEventImpl(Component.java:5012)
> at java.desktop/java.awt.Container.dispatchEventImpl(Container.java:2321)
> at java.desktop/java.awt.Component.dispatchEvent(Component.java:4844)
> at
> java.desktop/java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4919)
> at
> java.desktop/java.awt.LightweightDispatcher.processMouseEvent(Container.java:4548)
> at
> java.desktop/java.awt.LightweightDispatcher.dispatchEvent(Container.java:4489)
> at java.desktop/java.awt.Container.dispatchEventImpl(Container.java:2307)
> at java.desktop/java.awt.Window.dispatchEventImpl(Window.java:2764)
> at java.desktop/java.awt.Component.dispatchEvent(Component.java:4844)
> at java.desktop/java.awt.EventQueue.dispatchEventImpl(EventQueue.java:772)
> at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:721)
> at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:715)
> at
> java.base/java.security.AccessController.doPrivileged(AccessController.java:391)
> at
> java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:85)
> at
> java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:95)
> at java.desktop/java.awt.EventQueue$5.run(EventQueue.java:745)
> at java.desktop/java.awt.EventQueue$5.run(EventQueue.java:743)
> at
> java.base/java.security.AccessController.doPrivileged(AccessController.java:391)
> at
> java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:85)
> at java.desktop/java.awt.EventQueue.dispatchEvent(EventQueue.java:742)
> at
> java.desktop/java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:203)
> at
> java.desktop/java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:124)
> at
> java.desktop/java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:113)
> at
> java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:109)
> at
> java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
> at
> java.desktop/java.awt.EventDispatchThread.run(EventDispatchThread.java:90)
> Caused by: java.lang.NullPointerException
> at
> org.apache.tika.parser.pdf.AbstractPDF2XHTML.extractXMPXFA(AbstractPDF2XHTML.java:209)
> at
> org.apache.tika.parser.pdf.AbstractPDF2XHTML.endDocument(AbstractPDF2XHTML.java:678)
> at
> org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:267)
> at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:96)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:174)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ... 44 more
--
This message was sent by Atlassian Jira
(v8.3.4#803005)