https://issues.apache.org/bugzilla/show_bug.cgi?id=52372

             Bug #: 52372
           Summary: OutOfMemoryError parsing a word file
           Product: POI
           Version: 3.8-dev
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: critical
          Priority: P2
         Component: HPFS
        AssignedTo: [email protected]
        ReportedBy: [email protected]
    Classification: Unclassified


Created attachment 28090
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=28090
An anonymised Doc file reproducing the problem

Calling Parser#parseToString on the attached file produces an OOME.

This is because Tika doesn't validate the size it tries to allocate. Had it
been C code, this could have been a buffer overflow...

Not sure if the file is corrupted or not, it opens fine on Word Mac and WIndows
platform. Saving the file in one of these editors causes the problem to
disappear, so we've manually edited the content of the file to anonymise it yet
keep it as close as possible to the original. We're able to create similar
problems by flipping bits in files.

java.lang.OutOfMemoryError: Java heap space
    at org.apache.poi.hpsf.Section.<init>(Section.java:207)
    at org.apache.poi.hpsf.PropertySet.init(PropertySet.java:451)
    at org.apache.poi.hpsf.PropertySet.<init>(PropertySet.java:246)
    at
org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:73)
    at
org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:64)
    at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:186)
    at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:177)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
    at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at org.apache.tika.Tika.parseToString(Tika.java:380)
    at org.apache.tika.Tika.parseToString(Tika.java:414)
    at
no.finntech.tika.harderner.TikaIndexerHardenerTest.parseContent(TikaIndexerHardenerTest.java:100)
    at
no.finntech.tika.harderner.TikaIndexerHardenerTest.indexContent(TikaIndexerHardenerTest.java:91)
    at
no.finntech.tika.harderner.TikaIndexerHardenerTest.originalFileIndexesProperly2(TikaIndexerHardenerTest.java:34)

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to