[
https://issues.apache.org/jira/browse/TIKA-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Akash updated TIKA-3154:
------------------------
Description:
While parsing msg file containing some html text inside, we are getting
exception from Tika.
Command : java -jar tika-app-1.24.1.jar html_code.msg
Exception coming :
{code:java}
/Aug 07, 2020 10:59:00 PM org.apache.tika.config.InitializableProblemHandler$3
handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@7fcf2fc1
at
org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:293 undefined)
at
org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:280 undefined)
at
org.apache.tikar.AutoDetectParser.parse.parse(AutoDetectParser.java:143
undefined)
at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209
undefined)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496 undefined)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149 undefined)
Caused by: org.apache.poi.util.RecordFormatException: Tried to allocate an
array of length 1326748, but 1000000 is the maximum for this record type.
If the file is not corrupt, please open an issue on bugzilla to request
increasing the maximum allowable size for this record type.
As a temporary workaround, consider setting a higher override value with
IOUtils.setByteArrayMaxOverride()
at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:630 undefined)
at org.apache.poi.util.IOUtils.checkLength(IOUtils.java:208 undefined)
at org.apache.poi.util.IOUtils.safelyAllocateCheck(IOUtils.java:610
undefined)
at org.apache.poi.util.IOUtils.safelyAllocate(IOUtils.java:596
undefined)
at
org.apache.poi.hmef.attribute.MAPIRtfAttribute.<init>(MAPIRtfAttribute.java:49
undefined)
at
org.apache.tika.parser.microsoft.OutlookExtractor.handleBodyChunks(OutlookExtractor.java:328
undefined)
at
org.apache.tikar.microsoft.OutlookExtractor.parse.parse(OutlookExtractor.java:247
undefined)
at
org.apache.tikar.microsoft.OfficeParser.parse.parse(OfficeParser.java:199
undefined)
at
org.apache.tikar.microsoft.OfficeParser.parse.parse(OfficeParser.java:131
undefined)
at
org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:280
undefined)/
{code}
was:
While parsing msg file containing some html text inside, we are getting
exception from Tika.
Command : java -jar tika-app-1.24.1.jar html_code.msg
Exception coming :
See tika-parsers/pom.xml for the correct version.See tika-parsers/pom.xml for
the correct version.Exception in thread "main"
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@7fcf2fc1 at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:293) at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at
org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209) at
org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496) at
org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149)Caused by:
org.apache.poi.util.RecordFormatException: Tried to allocate an array of length
1326748, but 1000000 is the maximum for this record type.If the file is not
corrupt, please open an issue on bugzilla to request increasing the maximum
allowable size for this record type.As a temporary workaround, consider setting
a higher override value with IOUtils.setByteArrayMaxOverride()
{code:java}
/at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:630) at
org.apache.poi.util.IOUtils.checkLength(IOUtils.java:208) at
org.apache.poi.util.IOUtils.safelyAllocateCheck(IOUtils.java:610) at
org.apache.poi.util.IOUtils.safelyAllocate(IOUtils.java:596) at
org.apache.poi.hmef.attribute.MAPIRtfAttribute.<init>(MAPIRtfAttribute.java:49)
at
org.apache.tika.parser.microsoft.OutlookExtractor.handleBodyChunks(OutlookExtractor.java:328)
at
org.apache.tika.parser.microsoft.OutlookExtractor.parse(OutlookExtractor.java:247)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:199)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 5
more/
{code}
> Exception while extracting msg files
> ------------------------------------
>
> Key: TIKA-3154
> URL: https://issues.apache.org/jira/browse/TIKA-3154
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.24.1
> Reporter: Akash
> Priority: Major
>
> While parsing msg file containing some html text inside, we are getting
> exception from Tika.
> Command : java -jar tika-app-1.24.1.jar html_code.msg
> Exception coming :
> {code:java}
> /Aug 07, 2020 10:59:00 PM
> org.apache.tika.config.InitializableProblemHandler$3
> handleInitializableProblem
> WARNING: org.xerial's sqlite-jdbc is not loaded.
> Please provide the jar on your classpath to parse sqlite files.
> See tika-parsers/pom.xml for the correct version.
> Exception in thread "main" org.apache.tika.exception.TikaException:
> Unexpected RuntimeException from
> org.apache.tika.parser.microsoft.OfficeParser@7fcf2fc1
> at
> org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:293
> undefined)
> at
> org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:280
> undefined)
> at
> org.apache.tikar.AutoDetectParser.parse.parse(AutoDetectParser.java:143
> undefined)
> at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209
> undefined)
> at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496 undefined)
> at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149 undefined)
> Caused by: org.apache.poi.util.RecordFormatException: Tried to allocate an
> array of length 1326748, but 1000000 is the maximum for this record type.
> If the file is not corrupt, please open an issue on bugzilla to request
> increasing the maximum allowable size for this record type.
> As a temporary workaround, consider setting a higher override value with
> IOUtils.setByteArrayMaxOverride()
> at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:630 undefined)
> at org.apache.poi.util.IOUtils.checkLength(IOUtils.java:208 undefined)
> at org.apache.poi.util.IOUtils.safelyAllocateCheck(IOUtils.java:610
> undefined)
> at org.apache.poi.util.IOUtils.safelyAllocate(IOUtils.java:596
> undefined)
> at
> org.apache.poi.hmef.attribute.MAPIRtfAttribute.<init>(MAPIRtfAttribute.java:49
> undefined)
> at
> org.apache.tika.parser.microsoft.OutlookExtractor.handleBodyChunks(OutlookExtractor.java:328
> undefined)
> at
> org.apache.tikar.microsoft.OutlookExtractor.parse.parse(OutlookExtractor.java:247
> undefined)
> at
> org.apache.tikar.microsoft.OfficeParser.parse.parse(OfficeParser.java:199
> undefined)
> at
> org.apache.tikar.microsoft.OfficeParser.parse.parse(OfficeParser.java:131
> undefined)
> at
> org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:280
> undefined)/
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)