[ https://issues.apache.org/jira/browse/DRILL-7641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066199#comment-17066199 ]
Charles Givre commented on DRILL-7641: -------------------------------------- [~arina], Why was the fix version changed? I believe this PR is reviewable and is not a major change. I don't think this will be a major task to get this into Drill 1.18. > Convert Excel Reader to Use Streaming Reader > -------------------------------------------- > > Key: DRILL-7641 > URL: https://issues.apache.org/jira/browse/DRILL-7641 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Text & CSV > Affects Versions: 1.17.0 > Reporter: Charles Givre > Assignee: Charles Givre > Priority: Major > > The current implementation of the Excel reader uses the Apache POI reader, > which uses excessive amounts of memory. As a result, attempting to read large > Excel files will cause out of memory errors. > This PR converts the format plugin to use a streaming reader, based still on > the POI library. The documentation for the streaming reader can be found > here. [1] > All unit tests pass and I tested the plugin with some large Excel files on my > computer. > [1]: [https://github.com/pjfanning/excel-streaming-reader] > -- This message was sent by Atlassian Jira (v8.3.4#803005)