[
https://issues.apache.org/jira/browse/TIKA-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17178797#comment-17178797
]
Akash edited comment on TIKA-3154 at 8/17/20, 7:33 AM:
-------------------------------------------------------
[~tallison] Can we make this as a configuration parameter rather than hard
coding in code?
POI do provide an API to over ride this value. If that API can be invoked via
Tika code, then we set value as required.
was (Author: akki1607):
[~tallison] Can we make this as a configuration parameter rather than hard
coding in code?
> Exception while extracting msg files
> ------------------------------------
>
> Key: TIKA-3154
> URL: https://issues.apache.org/jira/browse/TIKA-3154
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.24.1
> Reporter: Akash
> Priority: Major
>
> While parsing msg file containing some html text inside, we are getting
> exception from Tika.
> Command : java -jar tika-app-1.24.1.jar html_code.msg
> Exception coming :
> {code:java}
> /Aug 07, 2020 10:59:00 PM
> org.apache.tika.config.InitializableProblemHandler$3
> handleInitializableProblem
> WARNING: org.xerial's sqlite-jdbc is not loaded.
> Please provide the jar on your classpath to parse sqlite files.
> See tika-parsers/pom.xml for the correct version.
> Exception in thread "main" org.apache.tika.exception.TikaException:
> Unexpected RuntimeException from
> org.apache.tika.parser.microsoft.OfficeParser@7fcf2fc1
> at
> org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:293
> undefined)
> at
> org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:280
> undefined)
> at
> org.apache.tikar.AutoDetectParser.parse.parse(AutoDetectParser.java:143
> undefined)
> at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209
> undefined)
> at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496 undefined)
> at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149 undefined)
> Caused by: org.apache.poi.util.RecordFormatException: Tried to allocate an
> array of length 1326748, but 1000000 is the maximum for this record type.
> If the file is not corrupt, please open an issue on bugzilla to request
> increasing the maximum allowable size for this record type.
> As a temporary workaround, consider setting a higher override value with
> IOUtils.setByteArrayMaxOverride()
> at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:630 undefined)
> at org.apache.poi.util.IOUtils.checkLength(IOUtils.java:208 undefined)
> at org.apache.poi.util.IOUtils.safelyAllocateCheck(IOUtils.java:610
> undefined)
> at org.apache.poi.util.IOUtils.safelyAllocate(IOUtils.java:596
> undefined)
> at
> org.apache.poi.hmef.attribute.MAPIRtfAttribute.<init>(MAPIRtfAttribute.java:49
> undefined)
> at
> org.apache.tika.parser.microsoft.OutlookExtractor.handleBodyChunks(OutlookExtractor.java:328
> undefined)
> at
> org.apache.tikar.microsoft.OutlookExtractor.parse.parse(OutlookExtractor.java:247
> undefined)
> at
> org.apache.tikar.microsoft.OfficeParser.parse.parse(OfficeParser.java:199
> undefined)
> at
> org.apache.tikar.microsoft.OfficeParser.parse.parse(OfficeParser.java:131
> undefined)
> at
> org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:280
> undefined)/
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)