https://bz.apache.org/bugzilla/show_bug.cgi?id=65649
Bug ID: 65649
Summary: org.apache.poi.util.RecordFormatException: Tried to
allocate an array of length 20785132, but 1000000 is
the maximum for this record type
Product: POI
Version: 4.1.2-FINAL
Hardware: PC
Status: NEW
Severity: normal
Priority: P2
Component: XWPF
Assignee: [email protected]
Reporter: [email protected]
Target Milestone: ---
Created attachment 38076
--> https://bz.apache.org/bugzilla/attachment.cgi?id=38076&action=edit
sample
I've processed 30k documents and found ~60 docx with the huge record length
(~20,000,000 - 50,000,000).
Similar issue for ppt: https://bz.apache.org/bugzilla/show_bug.cgi?id=65639
Stacktrace:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.ooxml.OOXMLParser@29b3f521
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:297)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
<business logic>
Caused by: org.apache.poi.util.RecordFormatException: Tried to allocate an
array of length 20785132, but 10000000 is the maximum for this record type.
If the file is not corrupt, please open an issue on bugzilla to request
increasing the maximum allowable size for this record type.
As a temporary workaround, consider setting a higher override value with
IOUtils.setByteArrayMaxOverride()
at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:630)
at org.apache.poi.util.IOUtils.checkLength(IOUtils.java:205)
at org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:173)
at org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:149)
at
org.apache.poi.openxml4j.util.ZipArchiveFakeEntry.<init>(ZipArchiveFakeEntry.java:47)
at
org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:53)
at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:106)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:307)
at
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:114)
at
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:113)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
... 11 more
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]