[
https://issues.apache.org/jira/browse/TIKA-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Richard Kraus updated TIKA-3319:
--------------------------------
Description:
01 Tika-1.24.1.jar and 1.24 python module have been running well for months on
my machine.
02 Then I get tesseract and a couple other things to integrate with it.
03 Then I upgrade python from 3.8.2 to 3.9.2
04 So I have always set the windows 10 $env: variable to something like
TIKA_SERVER_JAR="<yourpath>/tika-server.jar"
05 Then I run the tika python module. I get this urllib problem....
urllib.error.URLError: <urlopen error unknown url type: c>
06 Supposedly this is fixed by setting the $env: variable to something like...
TIKA_SERVER_JAR="file:///<yourpath>/tika-server.jar"
07 So I do this and mess around with it; no dice.
08 So then I'm trying to run Tika on powershell right?
java -jar "C:\PATH\TO\tika-app-1.24.1.jar" --gui
brings up the gui but it gives me these "Warnings" now...
{quote}Mar 14, 2021 10:33:27 PM
org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
See [https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io]
for optional dependencies.
Mar 14, 2021 10:33:27 PM org.apache.tika.config.InitializableProblemHandler$3
handleInitializableProblem
WARNING: Tesseract OCR is installed and will be automatically applied to image
files unless
you've excluded the TesseractOCRParser from the default parser.
Tesseract may dramatically slow down content extraction (TIKA-2359).
As of Tika 1.15 (and prior versions), Tesseract is automatically called.
In future versions of Tika, users may need to turn the TesseractOCRParser on
via TikaConfig.
Mar 14, 2021 10:33:27 PM org.apache.tika.config.InitializableProblemHandler$3
handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
{quote}
09 so now when I try to use the --gui to parse a file I have parsed before it
shows this message...
{quote}Apache Tika was unable to parse the documentApache Tika was unable to
parse the documentat C:\CODING\Apache Tika\Test03.pdf.
The full exception stack trace is included below:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.pdf.PDFParser@473cb131 at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:293) at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at
org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:188) at
org.apache.tika.parser.DigestingParser.parse(DigestingParser.java:84) at
org.apache.tika.gui.TikaGUI.handleStream(TikaGUI.java:358) at
org.apache.tika.gui.TikaGUI.openFile(TikaGUI.java:309) at
org.apache.tika.gui.TikaGUI.actionPerformed(TikaGUI.java:267) at
java.desktop/javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:1967)
at
java.desktop/javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2308)
at
java.desktop/javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:405)
at
java.desktop/javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:262)
at java.desktop/javax.swing.AbstractButton.doClick(AbstractButton.java:369) at
java.desktop/javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:1020)
at
java.desktop/javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:1064)
at java.desktop/java.awt.Component.processMouseEvent(Component.java:6636) at
java.desktop/javax.swing.JComponent.processMouseEvent(JComponent.java:3342) at
java.desktop/java.awt.Component.processEvent(Component.java:6401) at
java.desktop/java.awt.Container.processEvent(Container.java:2263) at
java.desktop/java.awt.Component.dispatchEventImpl(Component.java:5012) at
java.desktop/java.awt.Container.dispatchEventImpl(Container.java:2321) at
java.desktop/java.awt.Component.dispatchEvent(Component.java:4844) at
java.desktop/java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4919)
at
java.desktop/java.awt.LightweightDispatcher.processMouseEvent(Container.java:4548)
at
java.desktop/java.awt.LightweightDispatcher.dispatchEvent(Container.java:4489)
at java.desktop/java.awt.Container.dispatchEventImpl(Container.java:2307) at
java.desktop/java.awt.Window.dispatchEventImpl(Window.java:2764) at
java.desktop/java.awt.Component.dispatchEvent(Component.java:4844) at
java.desktop/java.awt.EventQueue.dispatchEventImpl(EventQueue.java:772) at
java.desktop/java.awt.EventQueue$4.run(EventQueue.java:721) at
java.desktop/java.awt.EventQueue$4.run(EventQueue.java:715) at
java.base/java.security.AccessController.doPrivileged(AccessController.java:391)
at
java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:85)
at
java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:95)
at java.desktop/java.awt.EventQueue$5.run(EventQueue.java:745) at
java.desktop/java.awt.EventQueue$5.run(EventQueue.java:743) at
java.base/java.security.AccessController.doPrivileged(AccessController.java:391)
at
java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:85)
at java.desktop/java.awt.EventQueue.dispatchEvent(EventQueue.java:742) at
java.desktop/java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:203)
at
java.desktop/java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:124)
at
java.desktop/java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:113)
at
java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:109)
at
java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
at
java.desktop/java.awt.EventDispatchThread.run(EventDispatchThread.java:90)Caused
by: java.lang.NullPointerException at
org.apache.tika.parser.pdf.AbstractPDF2XHTML.extractXMPXFA(AbstractPDF2XHTML.java:209)
at
org.apache.tika.parser.pdf.AbstractPDF2XHTML.endDocument(AbstractPDF2XHTML.java:678)
at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:267)
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:96) at
org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:174) at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 44
more{quote}
10 most notably these lines...
{quote}A) org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.pdf.PDFParser@473cb131
B) Caused by: java.lang.NullPointerException
{quote}
11 now here's my java -jar tika-app-1.24.1.jar --dump-current-config
{quote}Mar 14, 2021 10:15:23 PM
org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
See [https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io]
for optional dependencies.
Mar 14, 2021 10:15:24 PM org.apache.tika.config.InitializableProblemHandler$3
handleInitializableProblem
WARNING: Tesseract OCR is installed and will be automatically applied to image
files unless
you've excluded the TesseractOCRParser from the default parser.
Tesseract may dramatically slow down content extraction (TIKA-2359).
As of Tika 1.15 (and prior versions), Tesseract is automatically called.
In future versions of Tika, users may need to turn the TesseractOCRParser on
via TikaConfig.
Mar 14, 2021 10:15:24 PM org.apache.tika.config.InitializableProblemHandler$3
handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<properties>
<!--for example: <mimeTypeRepository
resource="/org/apache/tika/mime/tika-mimetypes.xml"/>-->
<service-loader dynamic="true" loadErrorHandler="IGNORE"/>
<encodingDetectors>
<encodingDetector class="org.apache.tika.detect.DefaultEncodingDetector"/>
</encodingDetectors>
<translator class="org.apache.tika.language.translate.DefaultTranslator"/>
<detectors>
<detector class="org.apache.tika.detect.DefaultDetector"/>
</detectors>
<parsers>
<parser class="org.apache.tika.parser.DefaultParser"/>
</parsers>
</properties>
{quote}
12 any help would be greatly appreciated.
13A the odd thing is when I run something like...
java -jar tika-app-1.24.1.jar -t Test03.pdf output.txt
13B it will print the document text in powershell then print this below it
(which I have never gotten before)...
{quote}Exception in thread "main" java.net.MalformedURLException: no protocol:
output.txt
at java.base/java.net.URL.<init>(URL.java:672)
at java.base/java.net.URL.<init>(URL.java:568)
at java.base/java.net.URL.<init>(URL.java:515)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:488)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149)
{quote}
was:
01 Tika-1.24.1.jar and 1.24 python module have been running well for months on
my machine.
02 Then I get tesseract and a couple other things to integrate with it.
03 Then I upgrade python from 3.8.2 to 3.9.2
04 So I have always set the windows 10 $env: variable to something like
TIKA_SERVER_JAR="<yourpath>/tika-server.jar"
05 Then I run the tika python module. I get this urllib problem....
urllib.error.URLError: <urlopen error unknown url type: c>
06 Supposedly this is fixed by setting the $env: variable to something like...
TIKA_SERVER_JAR="file:///<yourpath>/tika-server.jar"
07 So I do this and mess around with it; no dice.
08 So then I'm trying to run Tika on powershell right?
java -jar "C:\PATH\TO\tika-app-1.24.1.jar" --gui
brings up the gui but it gives me these "Warnings" now...
{quote}Mar 14, 2021 10:33:27 PM
org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.
Mar 14, 2021 10:33:27 PM org.apache.tika.config.InitializableProblemHandler$3
handleInitializableProblem
WARNING: Tesseract OCR is installed and will be automatically applied to image
files unless
you've excluded the TesseractOCRParser from the default parser.
Tesseract may dramatically slow down content extraction (TIKA-2359).
As of Tika 1.15 (and prior versions), Tesseract is automatically called.
In future versions of Tika, users may need to turn the TesseractOCRParser on
via TikaConfig.
Mar 14, 2021 10:33:27 PM org.apache.tika.config.InitializableProblemHandler$3
handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
09 so now when I try to use the --gui to parse a file I have parsed before it
shows this message...
Apache Tika was unable to parse the documentApache Tika was unable to parse the
documentat C:\CODING\Apache Tika\Test03.pdf.
The full exception stack trace is included below:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.pdf.PDFParser@473cb131 at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:293) at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at
org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:188) at
org.apache.tika.parser.DigestingParser.parse(DigestingParser.java:84) at
org.apache.tika.gui.TikaGUI.handleStream(TikaGUI.java:358) at
org.apache.tika.gui.TikaGUI.openFile(TikaGUI.java:309) at
org.apache.tika.gui.TikaGUI.actionPerformed(TikaGUI.java:267) at
java.desktop/javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:1967)
at
java.desktop/javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2308)
at
java.desktop/javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:405)
at
java.desktop/javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:262)
at java.desktop/javax.swing.AbstractButton.doClick(AbstractButton.java:369) at
java.desktop/javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:1020)
at
java.desktop/javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:1064)
at java.desktop/java.awt.Component.processMouseEvent(Component.java:6636) at
java.desktop/javax.swing.JComponent.processMouseEvent(JComponent.java:3342) at
java.desktop/java.awt.Component.processEvent(Component.java:6401) at
java.desktop/java.awt.Container.processEvent(Container.java:2263) at
java.desktop/java.awt.Component.dispatchEventImpl(Component.java:5012) at
java.desktop/java.awt.Container.dispatchEventImpl(Container.java:2321) at
java.desktop/java.awt.Component.dispatchEvent(Component.java:4844) at
java.desktop/java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4919)
at
java.desktop/java.awt.LightweightDispatcher.processMouseEvent(Container.java:4548)
at
java.desktop/java.awt.LightweightDispatcher.dispatchEvent(Container.java:4489)
at java.desktop/java.awt.Container.dispatchEventImpl(Container.java:2307) at
java.desktop/java.awt.Window.dispatchEventImpl(Window.java:2764) at
java.desktop/java.awt.Component.dispatchEvent(Component.java:4844) at
java.desktop/java.awt.EventQueue.dispatchEventImpl(EventQueue.java:772) at
java.desktop/java.awt.EventQueue$4.run(EventQueue.java:721) at
java.desktop/java.awt.EventQueue$4.run(EventQueue.java:715) at
java.base/java.security.AccessController.doPrivileged(AccessController.java:391)
at
java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:85)
at
java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:95)
at java.desktop/java.awt.EventQueue$5.run(EventQueue.java:745) at
java.desktop/java.awt.EventQueue$5.run(EventQueue.java:743) at
java.base/java.security.AccessController.doPrivileged(AccessController.java:391)
at
java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:85)
at java.desktop/java.awt.EventQueue.dispatchEvent(EventQueue.java:742) at
java.desktop/java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:203)
at
java.desktop/java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:124)
at
java.desktop/java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:113)
at
java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:109)
at
java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
at
java.desktop/java.awt.EventDispatchThread.run(EventDispatchThread.java:90)Caused
by: java.lang.NullPointerException at
org.apache.tika.parser.pdf.AbstractPDF2XHTML.extractXMPXFA(AbstractPDF2XHTML.java:209)
at
org.apache.tika.parser.pdf.AbstractPDF2XHTML.endDocument(AbstractPDF2XHTML.java:678)
at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:267)
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:96) at
org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:174) at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 44
more
{quote}
10 most notably these lines...
{quote}A) org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.pdf.PDFParser@473cb131
B) Caused by: java.lang.NullPointerException
{quote}
11 now here's my java -jar tika-app-1.24.1.jar --dump-current-config
{quote}Mar 14, 2021 10:15:23 PM
org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.
Mar 14, 2021 10:15:24 PM org.apache.tika.config.InitializableProblemHandler$3
handleInitializableProblem
WARNING: Tesseract OCR is installed and will be automatically applied to image
files unless
you've excluded the TesseractOCRParser from the default parser.
Tesseract may dramatically slow down content extraction (TIKA-2359).
As of Tika 1.15 (and prior versions), Tesseract is automatically called.
In future versions of Tika, users may need to turn the TesseractOCRParser on
via TikaConfig.
Mar 14, 2021 10:15:24 PM org.apache.tika.config.InitializableProblemHandler$3
handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<properties>
<!--for example: <mimeTypeRepository
resource="/org/apache/tika/mime/tika-mimetypes.xml"/>-->
<service-loader dynamic="true" loadErrorHandler="IGNORE"/>
<encodingDetectors>
<encodingDetector class="org.apache.tika.detect.DefaultEncodingDetector"/>
</encodingDetectors>
<translator class="org.apache.tika.language.translate.DefaultTranslator"/>
<detectors>
<detector class="org.apache.tika.detect.DefaultDetector"/>
</detectors>
<parsers>
<parser class="org.apache.tika.parser.DefaultParser"/>
</parsers>
</properties>
{quote}
12 any help would be greatly appreciated.
13A the odd thing is when I run something like...
java -jar tika-app-1.24.1.jar -t Test03.pdf output.txt
13B it will print the document text in powershell then print this below it
(which I have never gotten before)...
{quote}Exception in thread "main" java.net.MalformedURLException: no protocol:
output.txt
at java.base/java.net.URL.<init>(URL.java:672)
at java.base/java.net.URL.<init>(URL.java:568)
at java.base/java.net.URL.<init>(URL.java:515)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:488)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149)
{quote}
> Caused by: java.lang.NullPointerException (and more!)
> -----------------------------------------------------
>
> Key: TIKA-3319
> URL: https://issues.apache.org/jira/browse/TIKA-3319
> Project: Tika
> Issue Type: Bug
> Components: general
> Affects Versions: 1.24.1
> Environment: Windows 10
> Tika 1.24.1.jar
> Tika 1.24 python module
> python 3.9.2
> tesseract-ocr-w64-setup-v5.0.0-alpha.20201127
> (anything else that may be relevant?)
> Reporter: Richard Kraus
> Priority: Major
>
> 01 Tika-1.24.1.jar and 1.24 python module have been running well for months
> on my machine.
> 02 Then I get tesseract and a couple other things to integrate with it.
> 03 Then I upgrade python from 3.8.2 to 3.9.2
> 04 So I have always set the windows 10 $env: variable to something like
> TIKA_SERVER_JAR="<yourpath>/tika-server.jar"
> 05 Then I run the tika python module. I get this urllib problem....
> urllib.error.URLError: <urlopen error unknown url type: c>
> 06 Supposedly this is fixed by setting the $env: variable to something
> like...
> TIKA_SERVER_JAR="file:///<yourpath>/tika-server.jar"
> 07 So I do this and mess around with it; no dice.
> 08 So then I'm trying to run Tika on powershell right?
> java -jar "C:\PATH\TO\tika-app-1.24.1.jar" --gui
> brings up the gui but it gives me these "Warnings" now...
>
> {quote}Mar 14, 2021 10:33:27 PM
> org.apache.tika.config.InitializableProblemHandler$3
> handleInitializableProblem
> WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
> See [https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io]
> for optional dependencies.
> Mar 14, 2021 10:33:27 PM org.apache.tika.config.InitializableProblemHandler$3
> handleInitializableProblem
> WARNING: Tesseract OCR is installed and will be automatically applied to
> image files unless
> you've excluded the TesseractOCRParser from the default parser.
> Tesseract may dramatically slow down content extraction (TIKA-2359).
> As of Tika 1.15 (and prior versions), Tesseract is automatically called.
> In future versions of Tika, users may need to turn the TesseractOCRParser on
> via TikaConfig.
> Mar 14, 2021 10:33:27 PM
> org.apache.tika.config.InitializableProblemHandler$3
> handleInitializableProblem
> WARNING: org.xerial's sqlite-jdbc is not loaded.
> Please provide the jar on your classpath to parse sqlite files.
> See tika-parsers/pom.xml for the correct version.
> {quote}
> 09 so now when I try to use the --gui to parse a file I have parsed before it
> shows this message...
>
> {quote}Apache Tika was unable to parse the documentApache Tika was unable to
> parse the documentat C:\CODING\Apache Tika\Test03.pdf.
> The full exception stack trace is included below:
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from
> org.apache.tika.parser.pdf.PDFParser@473cb131 at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:293) at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at
> org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:188) at
> org.apache.tika.parser.DigestingParser.parse(DigestingParser.java:84) at
> org.apache.tika.gui.TikaGUI.handleStream(TikaGUI.java:358) at
> org.apache.tika.gui.TikaGUI.openFile(TikaGUI.java:309) at
> org.apache.tika.gui.TikaGUI.actionPerformed(TikaGUI.java:267) at
> java.desktop/javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:1967)
> at
> java.desktop/javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2308)
> at
> java.desktop/javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:405)
> at
> java.desktop/javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:262)
> at java.desktop/javax.swing.AbstractButton.doClick(AbstractButton.java:369)
> at
> java.desktop/javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:1020)
> at
> java.desktop/javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:1064)
> at java.desktop/java.awt.Component.processMouseEvent(Component.java:6636) at
> java.desktop/javax.swing.JComponent.processMouseEvent(JComponent.java:3342)
> at java.desktop/java.awt.Component.processEvent(Component.java:6401) at
> java.desktop/java.awt.Container.processEvent(Container.java:2263) at
> java.desktop/java.awt.Component.dispatchEventImpl(Component.java:5012) at
> java.desktop/java.awt.Container.dispatchEventImpl(Container.java:2321) at
> java.desktop/java.awt.Component.dispatchEvent(Component.java:4844) at
> java.desktop/java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4919)
> at
> java.desktop/java.awt.LightweightDispatcher.processMouseEvent(Container.java:4548)
> at
> java.desktop/java.awt.LightweightDispatcher.dispatchEvent(Container.java:4489)
> at java.desktop/java.awt.Container.dispatchEventImpl(Container.java:2307) at
> java.desktop/java.awt.Window.dispatchEventImpl(Window.java:2764) at
> java.desktop/java.awt.Component.dispatchEvent(Component.java:4844) at
> java.desktop/java.awt.EventQueue.dispatchEventImpl(EventQueue.java:772) at
> java.desktop/java.awt.EventQueue$4.run(EventQueue.java:721) at
> java.desktop/java.awt.EventQueue$4.run(EventQueue.java:715) at
> java.base/java.security.AccessController.doPrivileged(AccessController.java:391)
> at
> java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:85)
> at
> java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:95)
> at java.desktop/java.awt.EventQueue$5.run(EventQueue.java:745) at
> java.desktop/java.awt.EventQueue$5.run(EventQueue.java:743) at
> java.base/java.security.AccessController.doPrivileged(AccessController.java:391)
> at
> java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:85)
> at java.desktop/java.awt.EventQueue.dispatchEvent(EventQueue.java:742) at
> java.desktop/java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:203)
> at
> java.desktop/java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:124)
> at
> java.desktop/java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:113)
> at
> java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:109)
> at
> java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
> at
> java.desktop/java.awt.EventDispatchThread.run(EventDispatchThread.java:90)Caused
> by: java.lang.NullPointerException at
> org.apache.tika.parser.pdf.AbstractPDF2XHTML.extractXMPXFA(AbstractPDF2XHTML.java:209)
> at
> org.apache.tika.parser.pdf.AbstractPDF2XHTML.endDocument(AbstractPDF2XHTML.java:678)
> at
> org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:267) at
> org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:96) at
> org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:174) at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 44
> more{quote}
> 10 most notably these lines...
> {quote}A) org.apache.tika.exception.TikaException: Unexpected
> RuntimeException from org.apache.tika.parser.pdf.PDFParser@473cb131
> B) Caused by: java.lang.NullPointerException
> {quote}
> 11 now here's my java -jar tika-app-1.24.1.jar --dump-current-config
> {quote}Mar 14, 2021 10:15:23 PM
> org.apache.tika.config.InitializableProblemHandler$3
> handleInitializableProblem
> WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
> See [https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io]
> for optional dependencies.
> Mar 14, 2021 10:15:24 PM org.apache.tika.config.InitializableProblemHandler$3
> handleInitializableProblem
> WARNING: Tesseract OCR is installed and will be automatically applied to
> image files unless
> you've excluded the TesseractOCRParser from the default parser.
> Tesseract may dramatically slow down content extraction (TIKA-2359).
> As of Tika 1.15 (and prior versions), Tesseract is automatically called.
> In future versions of Tika, users may need to turn the TesseractOCRParser on
> via TikaConfig.
> Mar 14, 2021 10:15:24 PM
> org.apache.tika.config.InitializableProblemHandler$3
> handleInitializableProblem
> WARNING: org.xerial's sqlite-jdbc is not loaded.
> Please provide the jar on your classpath to parse sqlite files.
> See tika-parsers/pom.xml for the correct version.
> <?xml version="1.0" encoding="UTF-8" standalone="no"?>
> <properties>
> <!--for example: <mimeTypeRepository
> resource="/org/apache/tika/mime/tika-mimetypes.xml"/>-->
> <service-loader dynamic="true" loadErrorHandler="IGNORE"/>
> <encodingDetectors>
> <encodingDetector class="org.apache.tika.detect.DefaultEncodingDetector"/>
> </encodingDetectors>
> <translator class="org.apache.tika.language.translate.DefaultTranslator"/>
> <detectors>
> <detector class="org.apache.tika.detect.DefaultDetector"/>
> </detectors>
> <parsers>
> <parser class="org.apache.tika.parser.DefaultParser"/>
> </parsers>
> </properties>
> {quote}
> 12 any help would be greatly appreciated.
> 13A the odd thing is when I run something like...
> java -jar tika-app-1.24.1.jar -t Test03.pdf output.txt
> 13B it will print the document text in powershell then print this below it
> (which I have never gotten before)...
> {quote}Exception in thread "main" java.net.MalformedURLException: no
> protocol: output.txt
> at java.base/java.net.URL.<init>(URL.java:672)
> at java.base/java.net.URL.<init>(URL.java:568)
> at java.base/java.net.URL.<init>(URL.java:515)
> at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:488)
> at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149)
> {quote}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)