[ 
https://issues.apache.org/jira/browse/TIKA-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas DiPiazza updated TIKA-3223:
------------------------------------
    Description: 
Using this project as an example 
https://github.com/nddipiazza/tika-fork-parser-example

Problems happen when it encounters files that throw a valid exception. Example 
002164.ppt from digicorpa is encrypted, so it should throw.

When you use this constructor, you get the expected result:

{code}
try (ForkParser forkParser = new ForkParser(getClass().getClassLoader())) {
{code}

When you use this constructor

{code}
try (ForkParser forkParser = new ForkParser(Paths.get(pathToTikaMainDist), new 
ParserFactoryFactory("org.apache.tika.fork.main.CollectingParserFactory", 
parserArgs))) {
{code}

You will get a class not found exception - failing to serialize the exceptions.

{code}
org.apache.tika.exception.TikaException: Failed to communicate with a forked 
parser process. The process has most likely crashed due to some error like 
running out of memory. A new process will be started for the next parsing 
request.
        at org.apache.tika.fork.ForkParser.parse(ForkParser.java:275) 
~[tika-core-1.24.1.jar:1.24.1]
        at 
org.apache.tika.client.CollectingParser.parse(CollectingParser.java:36) 
~[classes/:?]
        at 
org.apache.tika.client.TikaAsyncMain.lambda$runThreads$0(TikaAsyncMain.java:317)
 ~[classes/:?]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
        at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264) 
[?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java) [?:?]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
        at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: java.io.IOException: Unable to deserialize an exception
        at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:295) 
~[tika-core-1.24.1.jar:1.24.1]
        at org.apache.tika.fork.ForkClient.call(ForkClient.java:209) 
~[tika-core-1.24.1.jar:1.24.1]
        at org.apache.tika.fork.ForkParser.parse(ForkParser.java:267) 
~[tika-core-1.24.1.jar:1.24.1]
        ... 8 more
Caused by: java.lang.ClassNotFoundException: 
org/apache/tika/exception/EncryptedDocumentException
        at java.lang.Class.forName0(Native Method) ~[?:?]
        at java.lang.Class.forName(Class.java:398) ~[?:?]
        at 
org.apache.tika.fork.ForkObjectInputStream.resolveClass(ForkObjectInputStream.java:69)
 ~[tika-core-1.24.1.jar:1.24.1]
        at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1995) ~[?:?]
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1862) 
~[?:?]
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2169) ~[?:?]
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1679) 
~[?:?]
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:493) 
~[?:?]
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:451) 
~[?:?]
        at 
org.apache.tika.fork.ForkObjectInputStream.readObject(ForkObjectInputStream.java:110)
 ~[tika-core-1.24.1.jar:1.24.1]
        at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:292) 
~[tika-core-1.24.1.jar:1.24.1]
        at org.apache.tika.fork.ForkClient.call(ForkClient.java:209) 
~[tika-core-1.24.1.jar:1.24.1]
        at org.apache.tika.fork.ForkParser.parse(ForkParser.java:267) 
~[tika-core-1.24.1.jar:1.24.1]
        ... 8 more
{code}

But I definitely have the Exception type on the classpath. Same thing happens 
for any tika exception. This is not limited to EncryptedDocumentException.

  was:
Using this project as an example 
https://github.com/nddipiazza/tika-fork-parser-example

Problems happen when it encounters files that throw a valid exception. Example 
002164.ppt from digicorpa is encrypted, so it should throw.

When you use this constructor, you get the expected result:

{code}
try (ForkParser forkParser = new ForkParser(getClass().getClassLoader())) {
{code}

When you use this constructor

{code}
try (ForkParser forkParser = new ForkParser(Paths.get(pathToTikaMainDist), new 
ParserFactoryFactory("org.apache.tika.fork.main.CollectingParserFactory", 
parserArgs))) {
{code}

You will get a class not found exception - failing to serialize the exceptions.

{code}
org.apache.tika.exception.TikaException: Failed to communicate with a forked 
parser process. The process has most likely crashed due to some error like 
running out of memory. A new process will be started for the next parsing 
request.
        at org.apache.tika.fork.ForkParser.parse(ForkParser.java:275) 
~[tika-core-1.24.1.jar:1.24.1]
        at 
org.apache.tika.client.CollectingParser.parse(CollectingParser.java:36) 
~[classes/:?]
        at 
org.apache.tika.client.TikaAsyncMain.lambda$runThreads$0(TikaAsyncMain.java:317)
 ~[classes/:?]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
        at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264) 
[?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java) [?:?]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
        at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: java.io.IOException: Unable to deserialize an exception
        at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:295) 
~[tika-core-1.24.1.jar:1.24.1]
        at org.apache.tika.fork.ForkClient.call(ForkClient.java:209) 
~[tika-core-1.24.1.jar:1.24.1]
        at org.apache.tika.fork.ForkParser.parse(ForkParser.java:267) 
~[tika-core-1.24.1.jar:1.24.1]
        ... 8 more
Caused by: java.lang.ClassNotFoundException: 
org/apache/tika/exception/EncryptedDocumentException
        at java.lang.Class.forName0(Native Method) ~[?:?]
        at java.lang.Class.forName(Class.java:398) ~[?:?]
        at 
org.apache.tika.fork.ForkObjectInputStream.resolveClass(ForkObjectInputStream.java:69)
 ~[tika-core-1.24.1.jar:1.24.1]
        at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1995) ~[?:?]
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1862) 
~[?:?]
        at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2169) ~[?:?]
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1679) 
~[?:?]
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:493) 
~[?:?]
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:451) 
~[?:?]
        at 
org.apache.tika.fork.ForkObjectInputStream.readObject(ForkObjectInputStream.java:110)
 ~[tika-core-1.24.1.jar:1.24.1]
        at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:292) 
~[tika-core-1.24.1.jar:1.24.1]
        at org.apache.tika.fork.ForkClient.call(ForkClient.java:209) 
~[tika-core-1.24.1.jar:1.24.1]
        at org.apache.tika.fork.ForkParser.parse(ForkParser.java:267) 
~[tika-core-1.24.1.jar:1.24.1]
        ... 8 more
{code}

But I definitely have the Exception type on the classpath. Same thing happens 
for any tika exception. 


> ForkParser cannot serialize exceptions when using the ForkParser(Path, 
> ParserFactoryFactory)
> --------------------------------------------------------------------------------------------
>
>                 Key: TIKA-3223
>                 URL: https://issues.apache.org/jira/browse/TIKA-3223
>             Project: Tika
>          Issue Type: Bug
>          Components: core
>            Reporter: Nicholas DiPiazza
>            Priority: Major
>
> Using this project as an example 
> https://github.com/nddipiazza/tika-fork-parser-example
> Problems happen when it encounters files that throw a valid exception. 
> Example 002164.ppt from digicorpa is encrypted, so it should throw.
> When you use this constructor, you get the expected result:
> {code}
> try (ForkParser forkParser = new ForkParser(getClass().getClassLoader())) {
> {code}
> When you use this constructor
> {code}
> try (ForkParser forkParser = new ForkParser(Paths.get(pathToTikaMainDist), 
> new ParserFactoryFactory("org.apache.tika.fork.main.CollectingParserFactory", 
> parserArgs))) {
> {code}
> You will get a class not found exception - failing to serialize the 
> exceptions.
> {code}
> org.apache.tika.exception.TikaException: Failed to communicate with a forked 
> parser process. The process has most likely crashed due to some error like 
> running out of memory. A new process will be started for the next parsing 
> request.
>       at org.apache.tika.fork.ForkParser.parse(ForkParser.java:275) 
> ~[tika-core-1.24.1.jar:1.24.1]
>       at 
> org.apache.tika.client.CollectingParser.parse(CollectingParser.java:36) 
> ~[classes/:?]
>       at 
> org.apache.tika.client.TikaAsyncMain.lambda$runThreads$0(TikaAsyncMain.java:317)
>  ~[classes/:?]
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
>       at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264) 
> [?:?]
>       at java.util.concurrent.FutureTask.run(FutureTask.java) [?:?]
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>       at java.lang.Thread.run(Thread.java:834) [?:?]
> Caused by: java.io.IOException: Unable to deserialize an exception
>       at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:295) 
> ~[tika-core-1.24.1.jar:1.24.1]
>       at org.apache.tika.fork.ForkClient.call(ForkClient.java:209) 
> ~[tika-core-1.24.1.jar:1.24.1]
>       at org.apache.tika.fork.ForkParser.parse(ForkParser.java:267) 
> ~[tika-core-1.24.1.jar:1.24.1]
>       ... 8 more
> Caused by: java.lang.ClassNotFoundException: 
> org/apache/tika/exception/EncryptedDocumentException
>       at java.lang.Class.forName0(Native Method) ~[?:?]
>       at java.lang.Class.forName(Class.java:398) ~[?:?]
>       at 
> org.apache.tika.fork.ForkObjectInputStream.resolveClass(ForkObjectInputStream.java:69)
>  ~[tika-core-1.24.1.jar:1.24.1]
>       at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1995) ~[?:?]
>       at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1862) 
> ~[?:?]
>       at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2169) 
> ~[?:?]
>       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1679) 
> ~[?:?]
>       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:493) 
> ~[?:?]
>       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:451) 
> ~[?:?]
>       at 
> org.apache.tika.fork.ForkObjectInputStream.readObject(ForkObjectInputStream.java:110)
>  ~[tika-core-1.24.1.jar:1.24.1]
>       at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:292) 
> ~[tika-core-1.24.1.jar:1.24.1]
>       at org.apache.tika.fork.ForkClient.call(ForkClient.java:209) 
> ~[tika-core-1.24.1.jar:1.24.1]
>       at org.apache.tika.fork.ForkParser.parse(ForkParser.java:267) 
> ~[tika-core-1.24.1.jar:1.24.1]
>       ... 8 more
> {code}
> But I definitely have the Exception type on the classpath. Same thing happens 
> for any tika exception. This is not limited to EncryptedDocumentException.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to