[ 
https://issues.apache.org/jira/browse/TIKA-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916517#action_12916517
 ] 

Stephen Duncan Jr commented on TIKA-521:
----------------------------------------

Using the POI API directly, and using their event-based model, I was able to to 
parse the file using less than 20MB of heap space (less than 64MB of heap size 
allocated).  Can Tika be modified to use the event based API when extracting 
text?  Here's sample code used:

final String filePath = "C:\\Users\\stephen.duncan\\tmp\\memory-test.xlsx";
XSSFEventBasedExcelExtractor extractor = new 
XSSFEventBasedExcelExtractor(filePath);

String text = extractor.getText();
System.out.println(text);

> OutOfMemoryError Parsing XSLX File
> ----------------------------------
>
>                 Key: TIKA-521
>                 URL: https://issues.apache.org/jira/browse/TIKA-521
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 0.7, 0.8
>            Reporter: Stephen Duncan Jr
>         Attachments: memory-test.xlsx
>
>
> I have several XSLX files I'm trying to parse with Tika that are failing with 
> an OutOfMemoryError even when using  a large heap size.  For instance the 
> attached 1.26MB excel file fails using a 512MB heap.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to