Akash created TIKA-3154:
---------------------------
Summary: Exception while extracting msg files
Key: TIKA-3154
URL: https://issues.apache.org/jira/browse/TIKA-3154
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.24.1
Reporter: Akash
While parsing msg file containing some html text inside, we are getting
exception from Tika.
Command : java -jar tika-app-1.24.1.jar html_code.msg
Exception coming :
See tika-parsers/pom.xml for the correct version.See tika-parsers/pom.xml for
the correct version.Exception in thread "main"
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@7fcf2fc1 at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:293) at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at
org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209) at
org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496) at
org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149)Caused by:
org.apache.poi.util.RecordFormatException: Tried to allocate an array of length
1326748, but 1000000 is the maximum for this record type.If the file is not
corrupt, please open an issue on bugzilla to request increasing the maximum
allowable size for this record type.As a temporary workaround, consider setting
a higher override value with IOUtils.setByteArrayMaxOverride() at
org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:630) at
org.apache.poi.util.IOUtils.checkLength(IOUtils.java:208) at
org.apache.poi.util.IOUtils.safelyAllocateCheck(IOUtils.java:610) at
org.apache.poi.util.IOUtils.safelyAllocate(IOUtils.java:596) at
org.apache.poi.hmef.attribute.MAPIRtfAttribute.<init>(MAPIRtfAttribute.java:49)
at
org.apache.tika.parser.microsoft.OutlookExtractor.handleBodyChunks(OutlookExtractor.java:328)
at
org.apache.tika.parser.microsoft.OutlookExtractor.parse(OutlookExtractor.java:247)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:199)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 5
more
--
This message was sent by Atlassian Jira
(v8.3.4#803005)