vvysotskyi opened a new pull request #1886: DRILL-7273: Introduce operators for 
handling metadata
URL: https://github.com/apache/drill/pull/1886
 
 
   Jira: [DRILL-7273](https://issues.apache.org/jira/browse/DRILL-7273)
   
   This pull request introduces commands and operators for collecting table 
metadata and storing it to the metastore.
   
   Entry point for ANALYZE command is `MetastoreAnalyzeTableHandler` class. It 
creates plan which includes some metastore-specific operators for collecting 
metadata.
   
   New operators are the following:
   `MetadataAggBatch` - operator which adds aggregate calls for all incoming 
table columns to calculate required metadata and produces aggregations. If 
aggregation is performed on top of another aggregation, required aggregate 
calls for merging metadata will be added.
   
   `MetadataHandlerBatch` - operator responsible for handling metadata returned 
by incoming aggregate operators and fetching required metadata form the 
metastore to produce further aggregations.
   
   `MetadataControllerBatch` - responsible for converting obtained metadata, 
fetching absent metadata from the metastore and storing resulting metadata into 
the metastore.
   
   `MetastoreAnalyzeTableHandler` has 2 classes which depending on the table 
type, provides the information required for building a suitable plan for 
collecting metadata: `AnalyzeInfoProvider` and `MetadataInfoCollector`.
   
   `MetastoreAnalyzeTableHandler` based on segments count, forms plan in the 
following form:
   
   ```
   MetadataControllerBatch
        ...
                MetadataHandlerBatch
                        MetadataAggBatch
                                MetadataHandlerBatch
                                        MetadataAggBatch
                                                Scan
   ```
   The lowest `MetadataAggBatch` creates required aggregate calls for every (or 
interesting only) table columns and produces aggregations with grouping by 
segment columns that correspond to specific table level.
   `MetadataHandlerBatch` above it populates batch with additional information 
about metadata type and other info.
   `MetadataAggBatch` above merges metadata calculated before to obtain 
metadata for parent metadata levels and also stores incoming data to populate 
it to the metastore later.
   
   `MetadataControllerBatch` obtains all calculated metadata, converts it to 
the suitable form and sends it to the metastore.
   
   For the case of incremental analyze, `MetastoreAnalyzeTableHandler` creates 
`Scan` with updated files only and provides `MetadataHandlerBatch` with 
information about metadata which should be fetched from the metastore, so 
existing actual metadata wouldn't be recalculated.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to