[ 
https://issues.apache.org/jira/browse/TIKA-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13680747#comment-13680747
 ] 

Ryan Krueger commented on TIKA-1132:
------------------------------------

Running jvisualvm and pulling a thread dump I get the same trace each time:

"main" prio=10 tid=0x0000000000606800 nid=0x7799 runnable [0x00007fe26bf1d000]
   java.lang.Thread.State: RUNNABLE
        at 
org.apache.poi.ss.usermodel.DataFormatter$FractionFormat.format(DataFormatter.java:1009)
        at 
org.apache.poi.ss.usermodel.DataFormatter$FractionFormat.format(DataFormatter.java:1033)
        at java.text.Format.format(Format.java:157)
        at 
org.apache.poi.ss.usermodel.DataFormatter.formatRawCellContents(DataFormatter.java:699)
        at 
org.apache.poi.ss.usermodel.DataFormatter.formatRawCellContents(DataFormatter.java:669)
        at 
org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener.formatNumberDateCell(FormatTrackingHSSFListener.java:129)
        at 
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.internalProcessRecord(ExcelExtractor.java:419)
        at 
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processRecord(ExcelExtractor.java:323)
        at 
org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener.processRecord(FormatTrackingHSSFListener.java:82)
        at 
org.apache.poi.hssf.eventusermodel.HSSFRequest.processRecord(HSSFRequest.java:112)
        at 
org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:147)
        at 
org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:106)
        at 
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processFile(ExcelExtractor.java:299)
        at 
org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:151)
        at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:194)
        at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
        at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
        at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:139)
        at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:415)
        at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:109)




Looking at POI 3.8 in grepcode I see the affected code.  The methods appear to 
be unchanged in 3.9.

I don't know what's causing the issue as it doesn't immediately appear to me to 
be an infinite loop.

Here is the apparent section from org.apache.poi.ss.usermodel.DataFormatter.

1005             double minVal = 1.0;
1006             double currDenom = Math.pow(10 ,  fractParts[1].length()) - 1d;
1007             double currNeum = 0;
1008             for (int i = (int)(Math.pow(10,  fractParts[1].length())- 1d); 
i > 0; i--) {
1009                for(int i2 = (int)(Math.pow(10,  fractParts[1].length())- 
1d); i2 > 0; i2--){
1010                   if (minVal >=  Math.abs((double)i2/(double)i - decPart)) 
{
1011                      currDenom = i;
1012                      currNeum = i2;
1013                      minVal = Math.abs((double)i2/(double)i  - decPart);
1014                   }
1015                }
1016             }
                
> Parsing some XLS documents hangs entire JVM, requires kill -9
> -------------------------------------------------------------
>
>                 Key: TIKA-1132
>                 URL: https://issues.apache.org/jira/browse/TIKA-1132
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.2, 1.3
>         Environment: Linux Suse:
> java version "1.7.0"
> Java(TM) SE Runtime Environment (build 1.7.0-b147)
> Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode)
> OSX 10.8.3:
> java version "1.7.0_06"
> Java(TM) SE Runtime Environment (build 1.7.0_06-b24)
> Java HotSpot(TM) 64-Bit Server VM (build 23.2-b09, mixed mode)
>            Reporter: Ryan Krueger
>             Fix For: 1.1
>
>         Attachments: mod.xls
>
>
> Some XLS documents hang the entire JVM.  A control-C or regular kill won't 
> stop the JVM, a kill -9 is required.
> We're running within an email server application parsing documents to extract 
> text of all attachments.  When we hit a message with the affected attachment 
> the entire JVM hangs and we mark the message to skip extracting the text from 
> the affected message the next attempt.  Unfortunately, it kills all email 
> processing on the server until the internal watchdogs kill -9 the application.
> We have seen the issue for several months with different documents, but they 
> are always Excel files.  Some get complaints from Excel when opening but not 
> all.
> In addition to experiencing the problem on our Linux servers I have tested on 
> OSX and experienced the same problems.  I ran the Tika UI and select the 
> affected file or run the CLI.  The problem is the same.
> Tested with java -jar /path/to/tika-app-1.3.jar -t /path/to/file.xls
> When running on multi-CPU machines there are two threads running at 100% 
> every time.
> I have attached a document that triggers the error.
> I have tested with 1.2 and 1.3 with the same result.  Running 1.1 the text is 
> accurately extracted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to