arina-ielchiieva commented on a change in pull request #1749: DRILL-7177: 
Format Plugin for Excel Files
URL: https://github.com/apache/drill/pull/1749#discussion_r335541785
 
 

 ##########
 File path: contrib/format-excel/README.md
 ##########
 @@ -0,0 +1,36 @@
+# Excel Format Plugin
+This plugin enables Drill to read Microsoft Excel files.  This format is best 
used with Excel files that do not have extensive formatting, however it will 
work with formatted files, by allowing you to define a region within the file 
where the data is.  
+
+The plugin will automatically evaluate cells which contain formulae.  
+
+## Plugin Configuration 
+This plugin has several configuration variables which must be set in order to 
read Excel files effectively.  Since Excel files often contain other elements 
besides data, you can use the configuration variables to define a region within 
your spreadsheet in which Drill should extract data.  This is potentially 
useful if your spreadsheet contains a lot of formatting or other complications. 
+
+* `headerRow`:  Set to -1 if there are no column headers.  
+* `lastRow`:  This defines the last row of your data.  The default is an 
arbitrary large number.  You only will need to set this if you want Drill to 
stop reading at a specific location.
+* `sheetName`:  This is the name of the sheet you want to query.  This will 
default to the first sheet in the file if left undefined. 
+* `firstColumn`:  If you want to define a region within a spreadsheet, this is 
the left-most column index.  This is indexed from one.  If set to `0` Drill 
will start at the left most column.
+* `lastColumn`:  If you want to define a region within a spreadsheet, this is 
the right-most column index.  This is indexed from one.  If set to `0` Drill 
will read all available columns.  This is not inclusive, so if you ask for 
columns 2-5 you will get columns 2,3 and 4. 
+
+## Usage
+You can specify the configuration at runtime via the `table()` method or in 
the storage plugin configuration.  For instance, if you just want to query an 
Excel file, you could execute the query as follows:
+
+```
+SELECT <fields> 
+FROM dfs.`somefile.xlsx`
+```
+
+If you wanted to query a different sheet other than the default, use the 
`table()` method as shown below:
+```
+SELECT <fields> 
+FROM table( dfs.`test_data.xlsx` (type => 'excel', sheetName => 'secondSheet'))
+```
+Theoretically, you could join data together from different sheets as follows:
 
 Review comment:
   Why theoretically? Did you try to this? If yes and it works, than please 
remove theoretically :) If it does not better remove the example.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to