arina-ielchiieva commented on a change in pull request #1749: DRILL-7177: Format Plugin for Excel Files URL: https://github.com/apache/drill/pull/1749#discussion_r335541785
########## File path: contrib/format-excel/README.md ########## @@ -0,0 +1,36 @@ +# Excel Format Plugin +This plugin enables Drill to read Microsoft Excel files. This format is best used with Excel files that do not have extensive formatting, however it will work with formatted files, by allowing you to define a region within the file where the data is. + +The plugin will automatically evaluate cells which contain formulae. + +## Plugin Configuration +This plugin has several configuration variables which must be set in order to read Excel files effectively. Since Excel files often contain other elements besides data, you can use the configuration variables to define a region within your spreadsheet in which Drill should extract data. This is potentially useful if your spreadsheet contains a lot of formatting or other complications. + +* `headerRow`: Set to -1 if there are no column headers. +* `lastRow`: This defines the last row of your data. The default is an arbitrary large number. You only will need to set this if you want Drill to stop reading at a specific location. +* `sheetName`: This is the name of the sheet you want to query. This will default to the first sheet in the file if left undefined. +* `firstColumn`: If you want to define a region within a spreadsheet, this is the left-most column index. This is indexed from one. If set to `0` Drill will start at the left most column. +* `lastColumn`: If you want to define a region within a spreadsheet, this is the right-most column index. This is indexed from one. If set to `0` Drill will read all available columns. This is not inclusive, so if you ask for columns 2-5 you will get columns 2,3 and 4. + +## Usage +You can specify the configuration at runtime via the `table()` method or in the storage plugin configuration. For instance, if you just want to query an Excel file, you could execute the query as follows: + +``` +SELECT <fields> +FROM dfs.`somefile.xlsx` +``` + +If you wanted to query a different sheet other than the default, use the `table()` method as shown below: +``` +SELECT <fields> +FROM table( dfs.`test_data.xlsx` (type => 'excel', sheetName => 'secondSheet')) +``` +Theoretically, you could join data together from different sheets as follows: Review comment: Why theoretically? Did you try to this? If yes and it works, than please remove theoretically :) If it does not better remove the example. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services