[
https://issues.apache.org/jira/browse/TIKA-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-3017.
-------------------------------
Fix Version/s: 1.24
Assignee: Tim Allison
Resolution: Fixed
> OOM in XSLFSheet.java
> ---------------------
>
> Key: TIKA-3017
> URL: https://issues.apache.org/jira/browse/TIKA-3017
> Project: Tika
> Issue Type: Bug
> Affects Versions: 1.19
> Reporter: Don
> Assignee: Tim Allison
> Priority: Major
> Fix For: 1.24
>
> Attachments: OOM_Slide_18.pptx
>
>
> When tika parses the attached power point slide it OOMs every time. The slide
> is a scrubbed slide from a Microsoft PowerPoint deck. Unfortunately I have no
> idea how the slide was created. When you open the slide it will look like it
> is a totally blank slide, however if you perform a select all on the slide
> while it is open in PowerPoint you will see there are two items contained in
> the slide, one inside the other. The person that created the slide deck is
> not longer available to give details as to how the slide was created. The two
> items in the slide deck appear to be text boxes, but I am not sure this is
> the case because if either one is removed and replace with a textbox using MS
> PowerPoint the OOM does not happen anymore. Also, if the slide is open in
> LibreOffice and then saved, the OOM does not happen. There seems to be
> something specific about whatever these items really are and how they were
> created.
> The following is the stack trace of the OOM when it is parsed by tika:
> {noformat}
> Executor task launch worker for task 47360
> at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48)
> at java.util.Arrays.copyOf([JI)[J (Arrays.java:3308)
> at java.util.BitSet.ensureCapacity(I)V (BitSet.java:337)
> at java.util.BitSet.expandTo(I)V (BitSet.java:352)
> at java.util.BitSet.set(I)V (BitSet.java:447)
> at org.apache.poi.xslf.usermodel.XSLFSheet.registerShapeId(I)V
> (XSLFSheet.java:123)
> at
> org.apache.poi.xslf.usermodel.XSLFDrawing.<init>(Lorg/apache/poi/xslf/usermodel/XSLFSheet;Lorg/openxmlformats/schemas/presentationml/x2006/main/CTGroupShape;)V
> (XSLFDrawing.java:47)
> at org.apache.poi.xslf.usermodel.XSLFSheet.initDrawingAndShapes()V
> (XSLFSheet.java:214)
> at org.apache.poi.xslf.usermodel.XSLFSheet.getShapes()Ljava/util/List;
> (XSLFSheet.java:201)
> at
> org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.buildXHTML(Lorg/apache/tika/sax/XHTMLContentHandler;)V
> (XSLFPowerPointExtractorDecorator.java:110)
> at
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V
> (AbstractOOXMLExtractor.java:136)
> at
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V
> (OOXMLExtractorFactory.java:156)
> at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V
> (OOXMLParser.java:110)
> at
> org.apache.tika.parser.CompositeParser.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V
> (CompositeParser.java:280)
> at
> org.apache.tika.parser.CompositeParser.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V
> (CompositeParser.java:280)
> at
> org.apache.tika.parser.AutoDetectParser.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V
> (AutoDetectParser.java:143)
> at
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)