[
https://issues.apache.org/jira/browse/DRILL-7423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Charles Givre updated DRILL-7423:
---------------------------------
Description:
The Excel format plugin reads cells but there are ways to make the reading
process more efficient. Since the schema of an Excel file is not known in
advance, Drill must read the first row of data in order to extract the schema.
It is actually a bit more complex. To read the schema, Drill must first read
the header rows and convert them all into Strings. This gets us the header
names if present.
Drill cannot create writers until it actually reads the first row of data where
it will determine the data types. This creates an inefficiency in that when
Drill is writing the columns, it has to do a hash lookup for each column.
Since the columns are in a fixed order, it may be possible to store the writers
in a
was:The Excel format plugin reads cells but there are ways to make the
reading process more efficient.
> Create More Efficient Way to Read Excel Cells
> ---------------------------------------------
>
> Key: DRILL-7423
> URL: https://issues.apache.org/jira/browse/DRILL-7423
> Project: Apache Drill
> Issue Type: Improvement
> Affects Versions: 1.18.0
> Reporter: Charles Givre
> Priority: Major
>
> The Excel format plugin reads cells but there are ways to make the reading
> process more efficient. Since the schema of an Excel file is not known in
> advance, Drill must read the first row of data in order to extract the
> schema.
> It is actually a bit more complex. To read the schema, Drill must first read
> the header rows and convert them all into Strings. This gets us the header
> names if present.
> Drill cannot create writers until it actually reads the first row of data
> where it will determine the data types. This creates an inefficiency in that
> when Drill is writing the columns, it has to do a hash lookup for each
> column. Since the columns are in a fixed order, it may be possible to store
> the writers in a
--
This message was sent by Atlassian Jira
(v8.3.4#803005)