https://bz.apache.org/bugzilla/show_bug.cgi?id=60567

            Bug ID: 60567
           Summary: XSSFReader caused OutOfMemoryError when reading a
                    lerge excel file in HDFS as inputStream
           Product: POI
           Version: 3.14-FINAL
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: blocker
          Priority: P2
         Component: XSSF
          Assignee: [email protected]
          Reporter: [email protected]
                CC: [email protected], [email protected]
        Depends on: 57842
  Target Milestone: ---

My project is using POI library to read excel file in HDFS. The API I used is
as below:
==================
// inputStream is generated from a HDFS path, because OPCPackage could
// not recognize HDFS path directly.
XSSFReader xssfReader = new XSSFReader(OPCPackage.open(inputStream));
==================

The excel file has around 1,000,000 rows of simple data (columns like name, id,
address, etc.), and the file size is around 140MB. When I run my project, the
process consumes about 3.25GB memory, which is much bigger than the excel file
size.

AFAIK, reading from a String path or File uses much less memory than reading
from inputStream for XSSFReader. But for my case, because the excel file is in
HDFS file system, we could not pass the HDFS path to XSSFReader directly.

Could you please help to fix the issue that XSSFReader uses much more memory
when reading from inputStream?

Thank you.


Referenced Bugs:

https://bz.apache.org/bugzilla/show_bug.cgi?id=57842
[Bug 57842] Using POI 3.9 API memory consumed reading an xlsx file is not
released back to the operating system after completion
-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to