Nicholas DiPiazza created TIKA-3223:
---------------------------------------
Summary: ForkParser cannot serialize exceptions when using the
ForkParser(Path, ParserFactoryFactory)
Key: TIKA-3223
URL: https://issues.apache.org/jira/browse/TIKA-3223
Project: Tika
Issue Type: Bug
Components: core
Reporter: Nicholas DiPiazza
Using this project as an example
https://github.com/nddipiazza/tika-fork-parser-example
Problems happen when it encounters files that throw a valid exception. Example
002164.ppt from digicorpa is encrypted, so it should throw.
When you use this constructor, you get the expected result:
{code}
try (ForkParser forkParser = new ForkParser(getClass().getClassLoader())) {
{code}
When you use this constructor
{code}
try (ForkParser forkParser = new ForkParser(Paths.get(pathToTikaMainDist), new
ParserFactoryFactory("org.apache.tika.fork.main.CollectingParserFactory",
parserArgs))) {
{code}
You will get a class not found exception - failing to serialize the exceptions.
{code}
org.apache.tika.exception.TikaException: Failed to communicate with a forked
parser process. The process has most likely crashed due to some error like
running out of memory. A new process will be started for the next parsing
request.
at org.apache.tika.fork.ForkParser.parse(ForkParser.java:275)
~[tika-core-1.24.1.jar:1.24.1]
at
org.apache.tika.client.CollectingParser.parse(CollectingParser.java:36)
~[classes/:?]
at
org.apache.tika.client.TikaAsyncMain.lambda$runThreads$0(TikaAsyncMain.java:317)
~[classes/:?]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264)
[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java) [?:?]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[?:?]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: java.io.IOException: Unable to deserialize an exception
at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:295)
~[tika-core-1.24.1.jar:1.24.1]
at org.apache.tika.fork.ForkClient.call(ForkClient.java:209)
~[tika-core-1.24.1.jar:1.24.1]
at org.apache.tika.fork.ForkParser.parse(ForkParser.java:267)
~[tika-core-1.24.1.jar:1.24.1]
... 8 more
Caused by: java.lang.ClassNotFoundException:
org/apache/tika/exception/EncryptedDocumentException
at java.lang.Class.forName0(Native Method) ~[?:?]
at java.lang.Class.forName(Class.java:398) ~[?:?]
at
org.apache.tika.fork.ForkObjectInputStream.resolveClass(ForkObjectInputStream.java:69)
~[tika-core-1.24.1.jar:1.24.1]
at
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1995) ~[?:?]
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1862)
~[?:?]
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2169) ~[?:?]
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1679)
~[?:?]
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:493)
~[?:?]
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:451)
~[?:?]
at
org.apache.tika.fork.ForkObjectInputStream.readObject(ForkObjectInputStream.java:110)
~[tika-core-1.24.1.jar:1.24.1]
at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:292)
~[tika-core-1.24.1.jar:1.24.1]
at org.apache.tika.fork.ForkClient.call(ForkClient.java:209)
~[tika-core-1.24.1.jar:1.24.1]
at org.apache.tika.fork.ForkParser.parse(ForkParser.java:267)
~[tika-core-1.24.1.jar:1.24.1]
... 8 more
{code}
But I definitely have those exceptions on both the classpath.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)