[ 
https://issues.apache.org/jira/browse/TIKA-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226837#comment-17226837
 ] 

Nicholas DiPiazza commented on TIKA-3220:
-----------------------------------------

Yeah i was thinking #1 when i opened the ticket. Figured it's just logging 
change so that you don't look down the wrong rabbit hole. 

> ForkParser displays incorrect message when parse timeout is reached
> -------------------------------------------------------------------
>
>                 Key: TIKA-3220
>                 URL: https://issues.apache.org/jira/browse/TIKA-3220
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Nicholas DiPiazza
>            Priority: Major
>
> Build this ForkParser example
> https://github.com/nddipiazza/tika-fork-parser-example
> but change the server timeout to be 10 seconds.
> {code}
>     forkParser.setServerWaitTimeoutMillis(10000);
> {code}
> Now run it with the following (open licensed xls file) 
> https://public.opendatasoft.com/explore/dataset/activite-epidemique-covid-19-departement-france/download/?format=xls&timezone=America/Chicago&lang=en&use_labels_for_header=true
> The purpose of this is to test the timeout feature on the ForkParser.
> {code}
> /home/ndipiazza/lucidworks/tika-fork-parser-example/tika-fork-main/build/dist 
> /home/ndipiazza/Downloads/coronavirus-tranche-age-urgences-sosmedecins-dep-france.xls
> {code}
> Expected Result:
> Stop parsing after it reached the max time and either return the bytes so far 
> or throw an error with the correct message stating that timeout was exceeded. 
> Actual result:
> You get the following error message.
> {code}
> Exception in thread "main" org.apache.tika.exception.TikaException: Could not 
> parse
>       at 
> org.apache.tika.client.CollectingParser.parseInternal(CollectingParser.java:104)
>       at 
> org.apache.tika.client.CollectingParser.parse(CollectingParser.java:70)
>       at org.apache.tika.client.TikaForkExample.main(TikaForkExample.java:49)
> Caused by: org.apache.tika.exception.TikaException: Failed to communicate 
> with a forked parser process. The process has most likely crashed due to some 
> error like running out of memory. A new process will be started for the next 
> parsing request.
>       at org.apache.tika.fork.ForkParser.parse(ForkParser.java:275)
>       at 
> org.apache.tika.client.CollectingParser.parseInternal(CollectingParser.java:101)
>       ... 2 more
> Caused by: java.io.IOException: Lost connection to a forked server process
>       at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:284)
>       at org.apache.tika.fork.ForkClient.call(ForkClient.java:209)
>       at org.apache.tika.fork.ForkParser.parse(ForkParser.java:267)
>       ... 3 more
> {code}
> If you increase the timeout, the file parses fine. It is not a memory issue. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to