[jira] [Resolved] (DRILL-7423) Create More Efficient Way to Read Excel Cells

Charles Givre (Jira) Mon, 30 Mar 2020 06:39:20 -0700


     [ 
https://issues.apache.org/jira/browse/DRILL-7423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Charles Givre resolved DRILL-7423.
----------------------------------
    Resolution: Resolved

> Create More Efficient Way to Read Excel Cells
> ---------------------------------------------
>
>                 Key: DRILL-7423
>                 URL: https://issues.apache.org/jira/browse/DRILL-7423
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.18.0
>            Reporter: Charles Givre
>            Priority: Major
>
> The Excel format plugin reads cells but there are ways to make the reading 
> process more efficient.  Since the schema of an Excel file is not known in 
> advance, Drill must read the first row of data in order to extract the 
> schema.  
> It is actually a bit more complex.  To read the schema, Drill must first read 
> the header rows and convert them all into Strings.  This gets us the header 
> names if present.
> Drill cannot create writers until it actually reads the first row of data 
> where it will determine the data types.  This creates an inefficiency in that 
> when Drill is writing the columns, it has to do a hash lookup for each 
> column.  Since the columns are in a fixed order, it may be possible to store 
> the writers in an array and gain some efficiency there.
> Also at present, if the columns are heterogenous, Drill requires the user to 
> use allTextMode to query the data.  It would be nice if Drill could query the 
> data w/o having to set that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (DRILL-7423) Create More Efficient Way to Read Excel Cells

Reply via email to