paul-rogers commented on a change in pull request #1986: Additional changes for 
Drill Metastore docs
URL: https://github.com/apache/drill/pull/1986#discussion_r384869981
 
 

 ##########
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##########
 @@ -10,6 +10,31 @@ The Metastore is a Beta feature; it is subject to change. 
We encourage you to tr
 Because the Metastore is in Beta, the SQL commands and Metastore formats may 
change in the next release.
 {% include startnote.html %}In Drill 1.17, this feature is supported for 
Parquet tables only and is disabled by default.{% include endnote.html %}
 
+## Drill Metastore introduction
+
+One of the main advantages of Drill is schema-on-read. But Drill can’t handle 
some cases with this approach, there are the issues related to Schema Evolution 
and Schema Changes.
+
+Significant benefits of schema-aware execution:
+
+ - At Planning time:
+    - Better scope for planning optimizations.
+    - Proper estimation of column widths since types are known, hence more 
accurate costing.
+    - Graceful early exit if certain data type validations fail.
+ - At Runtime:
+    - Avoids some cases with `SchemaChange` exceptions. All minor fragments 
will have a common understanding of the schema.
+
+Reading the data along with its statistics metadata helps to build more 
efficient plans and optimize query execution:
+
+ - Crucial for optimal join planning, 2-phase aggregation vs 1-phase 
aggregation planning, selectivity estimation of filter conditions, 
parallelization decisions.
+
+Taking into account the above points, existing query processing can be 
improved by:
+
+ - storing table schema and reusing it;
+ - collecting, storing and reusing table statistics to improve query planning.
+
+One of the main steps to resolve all these goals is providing the framework 
for Metadata management named hereafter
 
 Review comment:
   lowercase "metadata"

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to