[
https://issues.apache.org/jira/browse/TIKA-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nicholas DiPiazza updated TIKA-3223:
------------------------------------
Description:
Using this project as an example
https://github.com/nddipiazza/tika-fork-parser-example
Problems happen when it encounters files that throw a valid exception. Example
002164.ppt from digicorpa is encrypted, so it should throw.
When you use this constructor, you get the expected result:
{code}
try (ForkParser forkParser = new ForkParser(getClass().getClassLoader())) {
{code}
When you use this constructor
{code}
try (ForkParser forkParser = new ForkParser(Paths.get(pathToTikaMainDist), new
ParserFactoryFactory("org.apache.tika.fork.main.CollectingParserFactory",
parserArgs))) {
{code}
You will get a class not found exception - failing to serialize the exceptions.
{code}
org.apache.tika.exception.TikaException: Failed to communicate with a forked
parser process. The process has most likely crashed due to some error like
running out of memory. A new process will be started for the next parsing
request.
at org.apache.tika.fork.ForkParser.parse(ForkParser.java:275)
~[tika-core-1.24.1.jar:1.24.1]
at
org.apache.tika.client.CollectingParser.parse(CollectingParser.java:36)
~[classes/:?]
at
org.apache.tika.client.TikaAsyncMain.lambda$runThreads$0(TikaAsyncMain.java:317)
~[classes/:?]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264)
[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java) [?:?]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[?:?]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: java.io.IOException: Unable to deserialize an exception
at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:295)
~[tika-core-1.24.1.jar:1.24.1]
at org.apache.tika.fork.ForkClient.call(ForkClient.java:209)
~[tika-core-1.24.1.jar:1.24.1]
at org.apache.tika.fork.ForkParser.parse(ForkParser.java:267)
~[tika-core-1.24.1.jar:1.24.1]
... 8 more
Caused by: java.lang.ClassNotFoundException:
org/apache/tika/exception/EncryptedDocumentException
at java.lang.Class.forName0(Native Method) ~[?:?]
at java.lang.Class.forName(Class.java:398) ~[?:?]
at
org.apache.tika.fork.ForkObjectInputStream.resolveClass(ForkObjectInputStream.java:69)
~[tika-core-1.24.1.jar:1.24.1]
at
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1995) ~[?:?]
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1862)
~[?:?]
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2169) ~[?:?]
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1679)
~[?:?]
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:493)
~[?:?]
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:451)
~[?:?]
at
org.apache.tika.fork.ForkObjectInputStream.readObject(ForkObjectInputStream.java:110)
~[tika-core-1.24.1.jar:1.24.1]
at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:292)
~[tika-core-1.24.1.jar:1.24.1]
at org.apache.tika.fork.ForkClient.call(ForkClient.java:209)
~[tika-core-1.24.1.jar:1.24.1]
at org.apache.tika.fork.ForkParser.parse(ForkParser.java:267)
~[tika-core-1.24.1.jar:1.24.1]
... 8 more
{code}
But I definitely have the Exception type on the classpath. Same thing happens
for any tika exception. This is not limited to EncryptedDocumentException.
was:
Using this project as an example
https://github.com/nddipiazza/tika-fork-parser-example
Problems happen when it encounters files that throw a valid exception. Example
002164.ppt from digicorpa is encrypted, so it should throw.
When you use this constructor, you get the expected result:
{code}
try (ForkParser forkParser = new ForkParser(getClass().getClassLoader())) {
{code}
When you use this constructor
{code}
try (ForkParser forkParser = new ForkParser(Paths.get(pathToTikaMainDist), new
ParserFactoryFactory("org.apache.tika.fork.main.CollectingParserFactory",
parserArgs))) {
{code}
You will get a class not found exception - failing to serialize the exceptions.
{code}
org.apache.tika.exception.TikaException: Failed to communicate with a forked
parser process. The process has most likely crashed due to some error like
running out of memory. A new process will be started for the next parsing
request.
at org.apache.tika.fork.ForkParser.parse(ForkParser.java:275)
~[tika-core-1.24.1.jar:1.24.1]
at
org.apache.tika.client.CollectingParser.parse(CollectingParser.java:36)
~[classes/:?]
at
org.apache.tika.client.TikaAsyncMain.lambda$runThreads$0(TikaAsyncMain.java:317)
~[classes/:?]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264)
[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java) [?:?]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[?:?]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: java.io.IOException: Unable to deserialize an exception
at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:295)
~[tika-core-1.24.1.jar:1.24.1]
at org.apache.tika.fork.ForkClient.call(ForkClient.java:209)
~[tika-core-1.24.1.jar:1.24.1]
at org.apache.tika.fork.ForkParser.parse(ForkParser.java:267)
~[tika-core-1.24.1.jar:1.24.1]
... 8 more
Caused by: java.lang.ClassNotFoundException:
org/apache/tika/exception/EncryptedDocumentException
at java.lang.Class.forName0(Native Method) ~[?:?]
at java.lang.Class.forName(Class.java:398) ~[?:?]
at
org.apache.tika.fork.ForkObjectInputStream.resolveClass(ForkObjectInputStream.java:69)
~[tika-core-1.24.1.jar:1.24.1]
at
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1995) ~[?:?]
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1862)
~[?:?]
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2169) ~[?:?]
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1679)
~[?:?]
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:493)
~[?:?]
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:451)
~[?:?]
at
org.apache.tika.fork.ForkObjectInputStream.readObject(ForkObjectInputStream.java:110)
~[tika-core-1.24.1.jar:1.24.1]
at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:292)
~[tika-core-1.24.1.jar:1.24.1]
at org.apache.tika.fork.ForkClient.call(ForkClient.java:209)
~[tika-core-1.24.1.jar:1.24.1]
at org.apache.tika.fork.ForkParser.parse(ForkParser.java:267)
~[tika-core-1.24.1.jar:1.24.1]
... 8 more
{code}
But I definitely have the Exception type on the classpath. Same thing happens
for any tika exception.
> ForkParser cannot serialize exceptions when using the ForkParser(Path,
> ParserFactoryFactory)
> --------------------------------------------------------------------------------------------
>
> Key: TIKA-3223
> URL: https://issues.apache.org/jira/browse/TIKA-3223
> Project: Tika
> Issue Type: Bug
> Components: core
> Reporter: Nicholas DiPiazza
> Priority: Major
>
> Using this project as an example
> https://github.com/nddipiazza/tika-fork-parser-example
> Problems happen when it encounters files that throw a valid exception.
> Example 002164.ppt from digicorpa is encrypted, so it should throw.
> When you use this constructor, you get the expected result:
> {code}
> try (ForkParser forkParser = new ForkParser(getClass().getClassLoader())) {
> {code}
> When you use this constructor
> {code}
> try (ForkParser forkParser = new ForkParser(Paths.get(pathToTikaMainDist),
> new ParserFactoryFactory("org.apache.tika.fork.main.CollectingParserFactory",
> parserArgs))) {
> {code}
> You will get a class not found exception - failing to serialize the
> exceptions.
> {code}
> org.apache.tika.exception.TikaException: Failed to communicate with a forked
> parser process. The process has most likely crashed due to some error like
> running out of memory. A new process will be started for the next parsing
> request.
> at org.apache.tika.fork.ForkParser.parse(ForkParser.java:275)
> ~[tika-core-1.24.1.jar:1.24.1]
> at
> org.apache.tika.client.CollectingParser.parse(CollectingParser.java:36)
> ~[classes/:?]
> at
> org.apache.tika.client.TikaAsyncMain.lambda$runThreads$0(TikaAsyncMain.java:317)
> ~[classes/:?]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
> at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264)
> [?:?]
> at java.util.concurrent.FutureTask.run(FutureTask.java) [?:?]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> [?:?]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> [?:?]
> at java.lang.Thread.run(Thread.java:834) [?:?]
> Caused by: java.io.IOException: Unable to deserialize an exception
> at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:295)
> ~[tika-core-1.24.1.jar:1.24.1]
> at org.apache.tika.fork.ForkClient.call(ForkClient.java:209)
> ~[tika-core-1.24.1.jar:1.24.1]
> at org.apache.tika.fork.ForkParser.parse(ForkParser.java:267)
> ~[tika-core-1.24.1.jar:1.24.1]
> ... 8 more
> Caused by: java.lang.ClassNotFoundException:
> org/apache/tika/exception/EncryptedDocumentException
> at java.lang.Class.forName0(Native Method) ~[?:?]
> at java.lang.Class.forName(Class.java:398) ~[?:?]
> at
> org.apache.tika.fork.ForkObjectInputStream.resolveClass(ForkObjectInputStream.java:69)
> ~[tika-core-1.24.1.jar:1.24.1]
> at
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1995) ~[?:?]
> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1862)
> ~[?:?]
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2169)
> ~[?:?]
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1679)
> ~[?:?]
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:493)
> ~[?:?]
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:451)
> ~[?:?]
> at
> org.apache.tika.fork.ForkObjectInputStream.readObject(ForkObjectInputStream.java:110)
> ~[tika-core-1.24.1.jar:1.24.1]
> at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:292)
> ~[tika-core-1.24.1.jar:1.24.1]
> at org.apache.tika.fork.ForkClient.call(ForkClient.java:209)
> ~[tika-core-1.24.1.jar:1.24.1]
> at org.apache.tika.fork.ForkParser.parse(ForkParser.java:267)
> ~[tika-core-1.24.1.jar:1.24.1]
> ... 8 more
> {code}
> But I definitely have the Exception type on the classpath. Same thing happens
> for any tika exception. This is not limited to EncryptedDocumentException.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)