[
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16817538#comment-16817538
]
ASF GitHub Bot commented on DRILL-7177:
---------------------------------------
cgivre commented on pull request #1749: DRILL-7177: Format Plugin for Excel
Files
URL: https://github.com/apache/drill/pull/1749
This pull request adds the functionality to enable Drill to query Excel
files.
# Excel Format Plugin
This plugin enables Drill to read Microsoft Excel files. This format is
best used with Excel files that do not have extensive formatting, however it
will work with formatted files, by allowing you to define a region within the
file where the data is.
The plugin will automatically evaluate cells which contain formulae.
## Plugin Configuration
This plugin has several configuration variables which must be set in order
to read Excel files effectively. Since Excel files often contain other
elements besides data, you can use the configuration variables to define a
region within your spreadsheet in which Drill should extract data. This is
potentially useful if your spreadsheet contains a lot of formatting or other
complications.
* `headerRow`: Set to -1 if there are no column headers.
* `lastRow`: This defines the last row of your data. The default is an
arbitrary large number. You only will need to set this if you want Drill to
stop reading at a specific location.
* `sheetName`: This is the name of the sheet you want to query. This will
default to the first sheet in the file if left undefined.
* `firstColumn`: If you want to define a region within a spreadsheet, this
is the left-most column index. This is indexed from one. If set to `0` Drill
will start at the left most column.
* `lastColumn`: If you want to define a region within a spreadsheet, this
is the right-most column index. This is indexed from one. If set to `0` Drill
will read all available columns. This is not inclusive, so if you ask for
columns 2-5 you will get columns 2,3 and 4.
## Usage
You can specify the configuration at runtime via the `table()` method or in
the storage plugin configuration. For instance, if you just want to query an
Excel file, you could execute the query as follows:
```
SELECT <fields>
FROM dfs.`somefile.xlsx`
```
If you wanted to query a different sheet other than the default, use the
`table()` method as shown below:
```
SELECT <fields>
FROM table( dfs.`test_data.xlsx` (type => 'excel', sheetName =>
'secondSheet'))
```
Theoretically, you could join data together from different sheets as follows:
```
SELECT <fields>
FROM table( dfs.`test_data.xlsx` (type => 'excel', sheetName =>
'secondSheet')) AS t1
INNER JOIN table( dfs.`test_data.xlsx` (type => 'excel', sheetName =>
'thirdSheet')) AS t2
ON t1.id = t2.id
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Format Plugin for Excel Files
> -----------------------------
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
> Issue Type: Improvement
> Reporter: Charles Givre
> Assignee: Charles Givre
> Priority: Major
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query
> Microsoft Excel files.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)