paul-rogers commented on a change in pull request #2030: Update docs for 
Metastore to point that all format plugins are supported
URL: https://github.com/apache/drill/pull/2030#discussion_r395772674
 
 

 ##########
 File path: 
_docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 ##########
 @@ -103,20 +105,50 @@ Schema information and summary statistics also computed 
and stored for table seg
 
 The detailed metadata schema is described 
[here](https://github.com/apache/drill/tree/master/metastore/metastore-api#metastore-tables).
 You can try out the metadata to get a sense of what is available, by using the
- [Inspect the Metastore using `INFORMATION_SCHEMA` 
tables]({{site.baseurl}}/docs/using-drill-metastore/#inspect-the-metastore-using-information_schema-tables)
 tutorial.
+ [Inspect the Metastore using `INFORMATION_SCHEMA` 
tables](#inspect-the-metastore-using-information_schema-tables) tutorial.
 
 Every table described by the Metastore may be a bare file or one or more files 
that reside in one or more directories.
 
 If a table consists of a single directory or file, then it is non-partitioned. 
The single directory can contain any number of files.
 Larger tables tend to have subdirectories. Each subdirectory is a partition 
and such a table are called "partitioned".
-Please refer to [Exposing Drill Metastore metadata through 
`INFORMATION_SCHEMA` 
tables]({{site.baseurl}}/docs/using-drill-metastore/#exposing-drill-metastore-metadata-through-information_schema-tables)
+Please refer to [Exposing Drill Metastore metadata through 
`INFORMATION_SCHEMA` 
tables](#exposing-drill-metastore-metadata-through-information_schema-tables)
  for information, how to query partitions and segments metadata.
 
 A traditional database divides tables into schemas and tables.
 Drill can connect to any number of data sources, each of which may have its 
own schema.
 As a result, the Metastore labels tables with a combination of (plugin 
configuration name, workspace name, table name).
 Note that if before renaming any of these items, you must delete table's 
Metadata entry and recreate it after renaming.
 
+### Using schema provisioning feature with Drill Metastore
+
+The Drill Metastore holds both schema and statistics information for a table. 
The `ANALYZE` command can infer the table
+ schema for well-defined tables (such as many Parquet tables). Some tables are 
too complex or variable for Drill's
+ schema inference to work well. For example, JSON tables often omit fields or 
have long runs of nulls so that Drill
+ cannot determine column types. In these cases, you can specify the correct 
schema based on your knowledge of the
+ table's structure. You specify a schema in the `ANALYZE` command using the 
+ [Schema 
provisioning]({{site.baseurl}}/docs/plugin-configuration-basics/#specifying-the-schema-as-table-function-parameter)
 syntax.
+
+Please refer to [Provisioning schema for Drill 
Metastore](#provisioning-schema-for-drill-metastore) for examples of usage.
+
+### Schema priority
+
+Drill allows the following ways for providing table schema:
+ - providing schema with table function:
+   - specifying inline schema;
+   - specifying path to the schema file;
+ - using schema file in table root directory;
+ - using schema from Drill Metastore.
+
+The highest priority has schema provided in table function.
+
+Second priority has schema file (if `store.table.use_schema_file` is enabled).
+
+If neither of the above schema sources wasn't specified, schema from Drill 
Metastore will be used.
+
+Regardless of the source of the schema, it will be used and handled in the 
same way.
+
+Table metadata from Drill Metastore will be used if it is available regardless 
of the schema source.
+
 
 Review comment:
   Drill uses metadata during both query planning and execution. Drill gives 
you multiple ways to provide a schema.
   
   When you run the `ANALYZE TABLE` command, Drill will uses the following 
rules for the table schema to be stored in the Metastore. In priority order:
   
   * A schema file, created with `CREATE OR REPLACE SCHEMA`, in the table root 
directory.
   * Schema inferred from file data.
   
   To plan a query, Drill requires information about your file partitions (if 
any) and about row and column cardinality. Drill does not use the provided 
schema for planning as it does not provide this metadata. Instead, at plan time 
Drill obtains metadata from one of the following, again in priority order:
   
   * The Drill Metastore, if available.
   * Inferred from file data. Drill scans the table's directory structure to 
identify partitions. Drill estimates row counts based on the file size. Drill 
uses default estimates for column cardinality.
   
   At query execution time, a schema tells Drill the shape of your data and how 
that data should be converted to Drill's SQL types. Your choices for 
execution-time schema, in priority order, are:
   
   *  With a table function:
      - specify an inline schema
      - specify the path to the schema file.
   * With a schema file, created with `CREATE OR REPLACE SCHEMA`, in the table 
root directory.
   * Using the schema from the Drill Metastore, if available.
   * Infer the schema directly from file data.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to