[
https://issues.apache.org/jira/browse/TIKA-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison reopened TIKA-1132:
-------------------------------
Assignee: Tim Allison
Will add test case in Tika.
> Parsing some XLS documents hangs entire JVM, requires kill -9
> -------------------------------------------------------------
>
> Key: TIKA-1132
> URL: https://issues.apache.org/jira/browse/TIKA-1132
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.2, 1.3
> Environment: Linux Suse:
> java version "1.7.0"
> Java(TM) SE Runtime Environment (build 1.7.0-b147)
> Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode)
> OSX 10.8.3:
> java version "1.7.0_06"
> Java(TM) SE Runtime Environment (build 1.7.0_06-b24)
> Java HotSpot(TM) 64-Bit Server VM (build 23.2-b09, mixed mode)
> Reporter: Ryan Krueger
> Assignee: Tim Allison
> Fix For: 1.5
>
> Attachments: mod3.xlsx, mod.xls
>
>
> Some XLS documents hang the entire JVM. A control-C or regular kill won't
> stop the JVM, a kill -9 is required.
> We're running within an email server application parsing documents to extract
> text of all attachments. When we hit a message with the affected attachment
> the entire JVM hangs and we mark the message to skip extracting the text from
> the affected message the next attempt. Unfortunately, it kills all email
> processing on the server until the internal watchdogs kill -9 the application.
> We have seen the issue for several months with different documents, but they
> are always Excel files. Some get complaints from Excel when opening but not
> all.
> In addition to experiencing the problem on our Linux servers I have tested on
> OSX and experienced the same problems. I ran the Tika UI and select the
> affected file or run the CLI. The problem is the same.
> Tested with java -jar /path/to/tika-app-1.3.jar -t /path/to/file.xls
> When running on multi-CPU machines there are two threads running at 100%
> every time.
> I have attached a document that triggers the error.
> I have tested with 1.2 and 1.3 with the same result. Running 1.1 the text is
> accurately extracted.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira