[
https://issues.apache.org/jira/browse/DRILL-7514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Charles Givre updated DRILL-7514:
---------------------------------
Description:
Drill's Excel Format Plugin uses Apache POI to parse Excel files. While this
reader is effective in that it parses formulae and data types, it uses memory
inefficiently and will struggle to read very large Excel files.
The latest version of POI addresses some of the memory issues and hopefully
Drill will be able to query larger Excel files without running out of memory.
was:
Drill's Excel Format Plugin uses Apache POI to parse Excel files. While this
reader is effective in that it parses formulae and data types, it uses memory
inefficiently and will struggle to read very large Excel files.
The Excel Streaming Reader [1] in an extension for the POI which would enable
Drill to parse Excel files much more efficiently and greatly improve the
reader's performance.
[[1]:
https://github.com/monitorjbl/excel-streaming-reader|https://github.com/monitorjbl/excel-streaming-reader]
> Update Apache POI to Latest Version
> -----------------------------------
>
> Key: DRILL-7514
> URL: https://issues.apache.org/jira/browse/DRILL-7514
> Project: Apache Drill
> Issue Type: Improvement
> Affects Versions: 1.17.0
> Reporter: Charles Givre
> Assignee: Charles Givre
> Priority: Minor
> Fix For: 1.18.0
>
>
> Drill's Excel Format Plugin uses Apache POI to parse Excel files. While this
> reader is effective in that it parses formulae and data types, it uses memory
> inefficiently and will struggle to read very large Excel files.
> The latest version of POI addresses some of the memory issues and hopefully
> Drill will be able to query larger Excel files without running out of memory.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)