Nicholas DiPiazza created TIKA-3223:
---------------------------------------

             Summary: ForkParser cannot serialize exceptions when using the 
ForkParser(Path, ParserFactoryFactory)
                 Key: TIKA-3223
                 URL: https://issues.apache.org/jira/browse/TIKA-3223
             Project: Tika
          Issue Type: Bug
          Components: core
            Reporter: Nicholas DiPiazza


Using this project as an example 
https://github.com/nddipiazza/tika-fork-parser-example

Problems happen when it encounters files that throw a valid exception. Example 
002164.ppt from digicorpa is encrypted, so it should throw.

When you use this constructor, you get the expected result:

{code}
try (ForkParser forkParser = new ForkParser(getClass().getClassLoader())) {
{code}

When you use this constructor

{code}
try (ForkParser forkParser = new ForkParser(Paths.get(pathToTikaMainDist), new 
ParserFactoryFactory("org.apache.tika.fork.main.CollectingParserFactory", 
parserArgs))) {
{code}

You will get a class not found exception - failing to serialize the exceptions.

{code}
org.apache.tika.exception.TikaException: Failed to communicate with a forked 
parser process. The process has most likely crashed due to some error like 
running out of memory. A new process will be started for the next parsing 
request.
        at org.apache.tika.fork.ForkParser.parse(ForkParser.java:275) 
~[tika-core-1.24.1.jar:1.24.1]
        at 
org.apache.tika.client.CollectingParser.parse(CollectingParser.java:36) 
~[classes/:?]
        at 
org.apache.tika.client.TikaAsyncMain.lambda$runThreads$0(TikaAsyncMain.java:317)
 ~[classes/:?]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
        at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264) 
[?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java) [?:?]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
        at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: java.io.IOException: Unable to deserialize an exception
        at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:295) 
~[tika-core-1.24.1.jar:1.24.1]
        at org.apache.tika.fork.ForkClient.call(ForkClient.java:209) 
~[tika-core-1.24.1.jar:1.24.1]
        at org.apache.tika.fork.ForkParser.parse(ForkParser.java:267) 
~[tika-core-1.24.1.jar:1.24.1]
        ... 8 more
Caused by: java.lang.ClassNotFoundException: 
org/apache/tika/exception/EncryptedDocumentException
        at java.lang.Class.forName0(Native Method) ~[?:?]
        at java.lang.Class.forName(Class.java:398) ~[?:?]
        at 
org.apache.tika.fork.ForkObjectInputStream.resolveClass(ForkObjectInputStream.java:69)
 ~[tika-core-1.24.1.jar:1.24.1]
        at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1995) ~[?:?]
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1862) 
~[?:?]
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2169) ~[?:?]
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1679) 
~[?:?]
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:493) 
~[?:?]
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:451) 
~[?:?]
        at 
org.apache.tika.fork.ForkObjectInputStream.readObject(ForkObjectInputStream.java:110)
 ~[tika-core-1.24.1.jar:1.24.1]
        at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:292) 
~[tika-core-1.24.1.jar:1.24.1]
        at org.apache.tika.fork.ForkClient.call(ForkClient.java:209) 
~[tika-core-1.24.1.jar:1.24.1]
        at org.apache.tika.fork.ForkParser.parse(ForkParser.java:267) 
~[tika-core-1.24.1.jar:1.24.1]
        ... 8 more
{code}

But I definitely have those exceptions on both the classpath. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to