Nicholas DiPiazza created TIKA-3220:
---------------------------------------

             Summary: ForkParser displays incorrect message when parse timeout 
is reached
                 Key: TIKA-3220
                 URL: https://issues.apache.org/jira/browse/TIKA-3220
             Project: Tika
          Issue Type: Bug
            Reporter: Nicholas DiPiazza


Build this ForkParser example

https://github.com/nddipiazza/tika-fork-parser-example

but change the server timeout to be 10 seconds.

{code}
    forkParser.setServerWaitTimeoutMillis(10000);
{code}

Now run it with the following (open licensed xls file) 
https://public.opendatasoft.com/explore/dataset/activite-epidemique-covid-19-departement-france/download/?format=xls&timezone=America/Chicago&lang=en&use_labels_for_header=true

Expected Result:

Stop parsing after it reached the max time and return the bytes so far. 

Actual result:

{code}
/home/ndipiazza/lucidworks/tika-fork-parser-example/tika-fork-main/build/dist 
/home/ndipiazza/Downloads/coronavirus-tranche-age-urgences-sosmedecins-dep-france.xls
{code}

You get the following error message.

{code}
Exception in thread "main" org.apache.tika.exception.TikaException: Could not 
parse
        at 
org.apache.tika.client.CollectingParser.parseInternal(CollectingParser.java:104)
        at 
org.apache.tika.client.CollectingParser.parse(CollectingParser.java:70)
        at org.apache.tika.client.TikaForkExample.main(TikaForkExample.java:49)
Caused by: org.apache.tika.exception.TikaException: Failed to communicate with 
a forked parser process. The process has most likely crashed due to some error 
like running out of memory. A new process will be started for the next parsing 
request.
        at org.apache.tika.fork.ForkParser.parse(ForkParser.java:275)
        at 
org.apache.tika.client.CollectingParser.parseInternal(CollectingParser.java:101)
        ... 2 more
Caused by: java.io.IOException: Lost connection to a forked server process
        at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:284)
        at org.apache.tika.fork.ForkClient.call(ForkClient.java:209)
        at org.apache.tika.fork.ForkParser.parse(ForkParser.java:267)
        ... 3 more
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to