Charles Givre created DRILL-7641: ------------------------------------ Summary: Convert Excel Reader to Use Streaming Reader Key: DRILL-7641 URL: https://issues.apache.org/jira/browse/DRILL-7641 Project: Apache Drill Issue Type: Improvement Components: Storage - Text & CSV Affects Versions: 1.17.0 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.18.0
The current implementation of the Excel reader uses the Apache POI reader, which uses excessive amounts of memory. As a result, attempting to read large Excel files will cause out of memory errors. This PR converts the format plugin to use a streaming reader, based still on the POI library. The documentation for the streaming reader can be found here. [1] All unit tests pass and I tested the plugin with some large Excel files on my computer. [1]: [https://github.com/pjfanning/excel-streaming-reader] -- This message was sent by Atlassian Jira (v8.3.4#803005)