https://issues.apache.org/bugzilla/show_bug.cgi?id=52069

             Bug #: 52069
           Summary: Heap out of memory errors for large xlsx files - even
                    when using PipedReader to read file
           Product: POI
           Version: unspecified
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: major
          Priority: P2
         Component: XSSF
        AssignedTo: dev@poi.apache.org
        ReportedBy: meghana.vishwan...@gmail.com
    Classification: Unclassified


While parsing an xlsx file of about 4 MB using Apache Tika 0.9, I came across
this error. I am using PipedReader and PipedWriter to access the file content.
Hence, I believe that heap size allocation is not really a problem since I have
been running the same code with much larger files. 

Looking at the memory consumption using a profiler, I found that instances of 2
classes - org.apache.xmlbeans.impl.store.Xobj$AttrXobj and Xobj$ElementXobj
seem to grow exponentially with file size. For the above mentioned file, there
were more than 1,600,000 objects of type Xobj$AttrXobj. 

I am attaching the xlsx file which caused this error. 

Note: this error also occurs for .docx files.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org

Reply via email to