paul-rogers commented on a change in pull request #1986: Additional changes for Drill Metastore docs URL: https://github.com/apache/drill/pull/1986#discussion_r384869981
########## File path: _docs/performance-tuning/drill-metastore/010-using-drill-metastore.md ########## @@ -10,6 +10,31 @@ The Metastore is a Beta feature; it is subject to change. We encourage you to tr Because the Metastore is in Beta, the SQL commands and Metastore formats may change in the next release. {% include startnote.html %}In Drill 1.17, this feature is supported for Parquet tables only and is disabled by default.{% include endnote.html %} +## Drill Metastore introduction + +One of the main advantages of Drill is schema-on-read. But Drill can’t handle some cases with this approach, there are the issues related to Schema Evolution and Schema Changes. + +Significant benefits of schema-aware execution: + + - At Planning time: + - Better scope for planning optimizations. + - Proper estimation of column widths since types are known, hence more accurate costing. + - Graceful early exit if certain data type validations fail. + - At Runtime: + - Avoids some cases with `SchemaChange` exceptions. All minor fragments will have a common understanding of the schema. + +Reading the data along with its statistics metadata helps to build more efficient plans and optimize query execution: + + - Crucial for optimal join planning, 2-phase aggregation vs 1-phase aggregation planning, selectivity estimation of filter conditions, parallelization decisions. + +Taking into account the above points, existing query processing can be improved by: + + - storing table schema and reusing it; + - collecting, storing and reusing table statistics to improve query planning. + +One of the main steps to resolve all these goals is providing the framework for Metadata management named hereafter Review comment: lowercase "metadata" ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services