[
https://issues.apache.org/jira/browse/TIKA-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226837#comment-17226837
]
Nicholas DiPiazza commented on TIKA-3220:
-----------------------------------------
Yeah i was thinking #1 when i opened the ticket. Figured it's just logging
change so that you don't look down the wrong rabbit hole.
> ForkParser displays incorrect message when parse timeout is reached
> -------------------------------------------------------------------
>
> Key: TIKA-3220
> URL: https://issues.apache.org/jira/browse/TIKA-3220
> Project: Tika
> Issue Type: Bug
> Reporter: Nicholas DiPiazza
> Priority: Major
>
> Build this ForkParser example
> https://github.com/nddipiazza/tika-fork-parser-example
> but change the server timeout to be 10 seconds.
> {code}
> forkParser.setServerWaitTimeoutMillis(10000);
> {code}
> Now run it with the following (open licensed xls file)
> https://public.opendatasoft.com/explore/dataset/activite-epidemique-covid-19-departement-france/download/?format=xls&timezone=America/Chicago&lang=en&use_labels_for_header=true
> The purpose of this is to test the timeout feature on the ForkParser.
> {code}
> /home/ndipiazza/lucidworks/tika-fork-parser-example/tika-fork-main/build/dist
> /home/ndipiazza/Downloads/coronavirus-tranche-age-urgences-sosmedecins-dep-france.xls
> {code}
> Expected Result:
> Stop parsing after it reached the max time and either return the bytes so far
> or throw an error with the correct message stating that timeout was exceeded.
> Actual result:
> You get the following error message.
> {code}
> Exception in thread "main" org.apache.tika.exception.TikaException: Could not
> parse
> at
> org.apache.tika.client.CollectingParser.parseInternal(CollectingParser.java:104)
> at
> org.apache.tika.client.CollectingParser.parse(CollectingParser.java:70)
> at org.apache.tika.client.TikaForkExample.main(TikaForkExample.java:49)
> Caused by: org.apache.tika.exception.TikaException: Failed to communicate
> with a forked parser process. The process has most likely crashed due to some
> error like running out of memory. A new process will be started for the next
> parsing request.
> at org.apache.tika.fork.ForkParser.parse(ForkParser.java:275)
> at
> org.apache.tika.client.CollectingParser.parseInternal(CollectingParser.java:101)
> ... 2 more
> Caused by: java.io.IOException: Lost connection to a forked server process
> at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:284)
> at org.apache.tika.fork.ForkClient.call(ForkClient.java:209)
> at org.apache.tika.fork.ForkParser.parse(ForkParser.java:267)
> ... 3 more
> {code}
> If you increase the timeout, the file parses fine. It is not a memory issue.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)