[ https://issues.apache.org/jira/browse/TIKA-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422235#comment-17422235 ]
Tim Allison commented on TIKA-3561: ----------------------------------- sheet9 is 1GB uncompressed; sheet 10 is 100MB uncompressed. That said, I'm able to parse it with tika-app with -Xmx512m. I think the key difference is that tika-app consumes a file and, it looks like you're consuming an inputstream (?). POI and other zip-based parsers are far more efficient when given a file, because they don't slurp the full thing into memory. > Tika throwing java.lang.OutOfMemoryError > ---------------------------------------- > > Key: TIKA-3561 > URL: https://issues.apache.org/jira/browse/TIKA-3561 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 2.1.0 > Reporter: Abha > Priority: Major > Attachments: Item.zip, out.tar.gz > > > Getting Fatal Exception when processing the attached document \{item.content > sub doc name is item.xlsx}. > Below is the exception log - > Caused by: java.lang.OutOfMemoryError: Java heap spaceCaused by: > java.lang.OutOfMemoryError: Java heap space at > java.io.ByteArrayOutputStream.<init>(ByteArrayOutputStream.java:77) at > org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:177) at > org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:149) at > org.apache.poi.openxml4j.util.ZipArchiveFakeEntry.<init>(ZipArchiveFakeEntry.java:47) > at > org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:53) > at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:106) at > org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:307) at > org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:112) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:113) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at > org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:188) at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) -- This message was sent by Atlassian Jira (v8.3.4#803005)