paul-rogers commented on a change in pull request #1953: Add docs for Drill 

 File path: 
 @@ -0,0 +1,158 @@
+parent: "SQL Commands"
+date: 2020-01-13
+Starting from Drill 1.17, you can store table metadata (including schema and 
computed statistics) into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan 
+{% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
+To enable Drill Metastore usage, the following option `metastore.enabled` 
should be set to `true`, as shown:
+       SET `metastore.enabled` = true;
+Alternatively, you can enable the option in the Drill Web UI at 
+## Syntax
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+       ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+       REFRESH METADATA ['level' LEVEL]
+       [{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+       [ SAMPLE number PERCENT ]]
+## Parameters
+The name of the table or directory for which Drill will collect table 
metadata. If the table does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store 
metadata. the Stored schema will include all table columns.
+Specifies to ignore collecting and storing metadata for all table columns.
+Optional varchar literal which specifies maximum level depth for collecting 
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. 
Default is `ALL`.
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to 
`false`), an error will be thrown when this clause is specified.
+Generates estimated statistics for the table to be stored into the Metastore. 
Currently is not supported.
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+Optional. Indicates that compute statistics should run on a subset of the data.
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute 
statistics. For example, if a table has 100 rows, `SAMPLE 50 PERCENT` indicates 
that statistics should be computed on 50 rows. The optimizer selects the rows 
at random. 
+## Related Options
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during 
ANALYZE TABLE commands execution and to be able
+ to read table metadata during regular queries execution or when querying some 
INFORMATION_SCHEMA tables. Default is `false`.
+- ****
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting 
that query metadata is changed.
+If the number of retries was exceeded, query will be planned without metadata 
information from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
+Allows using file metadata cache for the case when required metadata is absent 
in the Metastore. Default is true.
+- **metastore.metadata.use_schema**
+Enables schema usage, stored to the Metastore. Default is `true`.
+- **metastore.metadata.use_statistics**
+Enables statistics usage, stored in the Metastore, at the planning stage. 
Default is `true`.
 Review comment:
   (How does this option relate to the planner option above? Must I enable 
both? Why would I disable one or the other?
   I wonder, should this stuff be in an appendix somewhere and not here? As a 
(somewhat informed) reader, I find myself becoming more and more confused about 
how this stuff even works! Maybe only list the key things here: options that 
represent knobs that the user might actually want to twiddle.)

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

With regards,
Apache Git Services

Reply via email to