https://issues.apache.org/bugzilla/show_bug.cgi?id=52372
Bug #: 52372
Summary: OutOfMemoryError parsing a word file
Product: POI
Version: 3.8-dev
Platform: All
OS/Version: All
Status: NEW
Severity: critical
Priority: P2
Component: HPFS
AssignedTo: [email protected]
ReportedBy: [email protected]
Classification: Unclassified
Created attachment 28090
--> https://issues.apache.org/bugzilla/attachment.cgi?id=28090
An anonymised Doc file reproducing the problem
Calling Parser#parseToString on the attached file produces an OOME.
This is because Tika doesn't validate the size it tries to allocate. Had it
been C code, this could have been a buffer overflow...
Not sure if the file is corrupted or not, it opens fine on Word Mac and WIndows
platform. Saving the file in one of these editors causes the problem to
disappear, so we've manually edited the content of the file to anonymise it yet
keep it as close as possible to the original. We're able to create similar
problems by flipping bits in files.
java.lang.OutOfMemoryError: Java heap space
at org.apache.poi.hpsf.Section.<init>(Section.java:207)
at org.apache.poi.hpsf.PropertySet.init(PropertySet.java:451)
at org.apache.poi.hpsf.PropertySet.<init>(PropertySet.java:246)
at
org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:73)
at
org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:64)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:186)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:177)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at org.apache.tika.Tika.parseToString(Tika.java:380)
at org.apache.tika.Tika.parseToString(Tika.java:414)
at
no.finntech.tika.harderner.TikaIndexerHardenerTest.parseContent(TikaIndexerHardenerTest.java:100)
at
no.finntech.tika.harderner.TikaIndexerHardenerTest.indexContent(TikaIndexerHardenerTest.java:91)
at
no.finntech.tika.harderner.TikaIndexerHardenerTest.originalFileIndexesProperly2(TikaIndexerHardenerTest.java:34)
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]