[
https://issues.apache.org/jira/browse/DRILL-7423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Charles Givre resolved DRILL-7423.
----------------------------------
Resolution: Resolved
> Create More Efficient Way to Read Excel Cells
> ---------------------------------------------
>
> Key: DRILL-7423
> URL: https://issues.apache.org/jira/browse/DRILL-7423
> Project: Apache Drill
> Issue Type: Improvement
> Affects Versions: 1.18.0
> Reporter: Charles Givre
> Priority: Major
>
> The Excel format plugin reads cells but there are ways to make the reading
> process more efficient. Since the schema of an Excel file is not known in
> advance, Drill must read the first row of data in order to extract the
> schema.
> It is actually a bit more complex. To read the schema, Drill must first read
> the header rows and convert them all into Strings. This gets us the header
> names if present.
> Drill cannot create writers until it actually reads the first row of data
> where it will determine the data types. This creates an inefficiency in that
> when Drill is writing the columns, it has to do a hash lookup for each
> column. Since the columns are in a fixed order, it may be possible to store
> the writers in an array and gain some efficiency there.
> Also at present, if the columns are heterogenous, Drill requires the user to
> use allTextMode to query the data. It would be nice if Drill could query the
> data w/o having to set that.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)