https://bz.apache.org/bugzilla/show_bug.cgi?id=65649

            Bug ID: 65649
           Summary: org.apache.poi.util.RecordFormatException: Tried to
                    allocate an array of length 20785132, but 1000000 is
                    the maximum for this record type
           Product: POI
           Version: 4.1.2-FINAL
          Hardware: PC
            Status: NEW
          Severity: normal
          Priority: P2
         Component: XWPF
          Assignee: [email protected]
          Reporter: [email protected]
  Target Milestone: ---

Created attachment 38076
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=38076&action=edit
sample

I've processed 30k documents and found ~60 docx with the huge record length
(~20,000,000 - 50,000,000).

Similar issue for ppt: https://bz.apache.org/bugzilla/show_bug.cgi?id=65639

Stacktrace:

org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.ooxml.OOXMLParser@29b3f521
        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:297)
        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
        at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

<business logic>

Caused by: org.apache.poi.util.RecordFormatException: Tried to allocate an
array of length 20785132, but 10000000 is the maximum for this record type.
If the file is not corrupt, please open an issue on bugzilla to request 
increasing the maximum allowable size for this record type.
As a temporary workaround, consider setting a higher override value with
IOUtils.setByteArrayMaxOverride()
        at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:630)
        at org.apache.poi.util.IOUtils.checkLength(IOUtils.java:205)
        at org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:173)
        at org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:149)
        at
org.apache.poi.openxml4j.util.ZipArchiveFakeEntry.<init>(ZipArchiveFakeEntry.java:47)
        at
org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:53)
        at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:106)
        at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:307)
        at
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:114)
        at
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:113)
        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
        ... 11 more

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to