[
https://issues.apache.org/jira/browse/TIKA-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174961#comment-17174961
]
Tim Allison commented on TIKA-3154:
-----------------------------------
Opened: https://bz.apache.org/bugzilla/show_bug.cgi?id=64659
> Exception while extracting msg files
> ------------------------------------
>
> Key: TIKA-3154
> URL: https://issues.apache.org/jira/browse/TIKA-3154
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.24.1
> Reporter: Akash
> Priority: Major
>
> While parsing msg file containing some html text inside, we are getting
> exception from Tika.
> Command : java -jar tika-app-1.24.1.jar html_code.msg
> Exception coming :
> {code:java}
> /Aug 07, 2020 10:59:00 PM
> org.apache.tika.config.InitializableProblemHandler$3
> handleInitializableProblem
> WARNING: org.xerial's sqlite-jdbc is not loaded.
> Please provide the jar on your classpath to parse sqlite files.
> See tika-parsers/pom.xml for the correct version.
> Exception in thread "main" org.apache.tika.exception.TikaException:
> Unexpected RuntimeException from
> org.apache.tika.parser.microsoft.OfficeParser@7fcf2fc1
> at
> org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:293
> undefined)
> at
> org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:280
> undefined)
> at
> org.apache.tikar.AutoDetectParser.parse.parse(AutoDetectParser.java:143
> undefined)
> at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209
> undefined)
> at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496 undefined)
> at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149 undefined)
> Caused by: org.apache.poi.util.RecordFormatException: Tried to allocate an
> array of length 1326748, but 1000000 is the maximum for this record type.
> If the file is not corrupt, please open an issue on bugzilla to request
> increasing the maximum allowable size for this record type.
> As a temporary workaround, consider setting a higher override value with
> IOUtils.setByteArrayMaxOverride()
> at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:630 undefined)
> at org.apache.poi.util.IOUtils.checkLength(IOUtils.java:208 undefined)
> at org.apache.poi.util.IOUtils.safelyAllocateCheck(IOUtils.java:610
> undefined)
> at org.apache.poi.util.IOUtils.safelyAllocate(IOUtils.java:596
> undefined)
> at
> org.apache.poi.hmef.attribute.MAPIRtfAttribute.<init>(MAPIRtfAttribute.java:49
> undefined)
> at
> org.apache.tika.parser.microsoft.OutlookExtractor.handleBodyChunks(OutlookExtractor.java:328
> undefined)
> at
> org.apache.tikar.microsoft.OutlookExtractor.parse.parse(OutlookExtractor.java:247
> undefined)
> at
> org.apache.tikar.microsoft.OfficeParser.parse.parse(OfficeParser.java:199
> undefined)
> at
> org.apache.tikar.microsoft.OfficeParser.parse.parse(OfficeParser.java:131
> undefined)
> at
> org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:280
> undefined)/
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)