[ 
https://issues.apache.org/jira/browse/TIKA-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-3017.
-------------------------------
    Fix Version/s: 1.24
         Assignee: Tim Allison
       Resolution: Fixed

> OOM in XSLFSheet.java
> ---------------------
>
>                 Key: TIKA-3017
>                 URL: https://issues.apache.org/jira/browse/TIKA-3017
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.19
>            Reporter: Don
>            Assignee: Tim Allison
>            Priority: Major
>             Fix For: 1.24
>
>         Attachments: OOM_Slide_18.pptx
>
>
> When tika parses the attached power point slide it OOMs every time. The slide 
> is a scrubbed slide from a Microsoft PowerPoint deck. Unfortunately I have no 
> idea how the slide was created. When you open the slide it will look like it 
> is a totally blank slide, however if you perform a select all on the slide 
> while it is open in PowerPoint you will see there are two items contained in 
> the slide, one inside the other. The person that created the slide deck is 
> not longer available to give details as to how the slide was created. The two 
> items in the slide deck appear to be text boxes, but I am not sure this is 
> the case because if either one is removed and replace with a textbox using MS 
> PowerPoint the OOM does not happen anymore. Also, if the slide is open in 
> LibreOffice and then saved, the OOM does not happen. There seems to be 
> something specific about whatever these items really are and how they were 
> created.
> The following is the stack trace of the OOM when it is parsed by tika:
> {noformat}
> Executor task launch worker for task 47360
>  at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48)
>  at java.util.Arrays.copyOf([JI)[J (Arrays.java:3308)
>  at java.util.BitSet.ensureCapacity(I)V (BitSet.java:337)
>  at java.util.BitSet.expandTo(I)V (BitSet.java:352)
>  at java.util.BitSet.set(I)V (BitSet.java:447)
>  at org.apache.poi.xslf.usermodel.XSLFSheet.registerShapeId(I)V 
> (XSLFSheet.java:123)
>  at 
> org.apache.poi.xslf.usermodel.XSLFDrawing.<init>(Lorg/apache/poi/xslf/usermodel/XSLFSheet;Lorg/openxmlformats/schemas/presentationml/x2006/main/CTGroupShape;)V
>  (XSLFDrawing.java:47)
>  at org.apache.poi.xslf.usermodel.XSLFSheet.initDrawingAndShapes()V 
> (XSLFSheet.java:214)
>  at org.apache.poi.xslf.usermodel.XSLFSheet.getShapes()Ljava/util/List; 
> (XSLFSheet.java:201)
>  at 
> org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.buildXHTML(Lorg/apache/tika/sax/XHTMLContentHandler;)V
>  (XSLFPowerPointExtractorDecorator.java:110)
>  at 
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V
>  (AbstractOOXMLExtractor.java:136)
>  at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V
>  (OOXMLExtractorFactory.java:156)
>  at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V
>  (OOXMLParser.java:110)
>  at 
> org.apache.tika.parser.CompositeParser.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V
>  (CompositeParser.java:280)
>  at 
> org.apache.tika.parser.CompositeParser.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V
>  (CompositeParser.java:280)
>  at 
> org.apache.tika.parser.AutoDetectParser.parse(Ljava/io/InputStream;Lorg/xml/sax/ContentHandler;Lorg/apache/tika/metadata/Metadata;Lorg/apache/tika/parser/ParseContext;)V
>  (AutoDetectParser.java:143)
>  at
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to