[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-7271:
---------------------------------------
    Description: 
1. Merge info from metadataStatistics + statisticsKinds into one holder: 
Map<String, StatisticsHolder>.
2. Rename hasStatistics to hasDescriptiveStatistics
3. Remove drill-file-metastore-plugin
4. Move  
org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel 
to metadata module, rename to MetadataType and add new value: SEGMENT.
5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
6. Add new info classes:
{noformat}
class TableInfo {
  String storagePlugin;
  String workspace;
  String name;
  String type;
  String owner;
}

class MetadataInfo {

  public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
  public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";

  MetadataType type (enum);
  String key;
  String identifier;
}
{noformat}
7. Modify existing metadata classes:
org.apache.drill.metastore.FileTableMetadata
{noformat}
missing fields
------------------
storagePlugin, workspace, tableType -> will be covered by TableInfo class
metadataType, metadataKey -> will be covered by MetadataInfo class
interestingColumns

fields to modify
----------------
private final Map<String, Object> tableStatistics;
private final Map<String, StatisticsKind> statisticsKinds;
private final Set<String> partitionKeys; -> Map<String, String>
{noformat}

org.apache.drill.metastore.PartitionMetadata
{noformat}
missing fields
------------------
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo class
partitionValues (List<String>)
location (String) (for directory level metadata) - directory location

fields to modify
----------------
private final Map<String, Object> tableStatistics;
private final Map<String, StatisticsKind> statisticsKinds;
private final Set<Path> location; -> locations
{noformat}

org.apache.drill.metastore.FileMetadata
{noformat}
missing fields
------------------
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo class
path - path to file 

fields to modify
----------------
private final Map<String, Object> tableStatistics;
private final Map<String, StatisticsKind> statisticsKinds;
private final Path location; - should contain directory to which file belongs
{noformat}
org.apache.drill.metastore.RowGroupMetadata
{noformat}
missing fields
------------------
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo class
path - path to file 

fields to modify
----------------
private final Map<String, Object> tableStatistics;
private final Map<String, StatisticsKind> statisticsKinds;
private final Path location; - should contain directory to which file belongs
{noformat}
8. Remove org.apache.drill.exec package from metastore module.
9. Rename ColumnStatisticsImpl class.
10. Separate existing classes in org.apache.drill.metastore package into 
sub-packages.
11. Rename FileTableMetadata -> BaseTableMetadata
12. TableMetadataProvider.getNonInterestingColumnsMeta() -> 
getNonInterestingColumnsMetadata
13. Introduce segment-level metadata class:
{noformat}
class SegmentMetadata {
  TableInfo tableInfo;
  MetadataInfo metadataInfo;
  SchemaPath column;
  TupleMetadata schema;
  String location;
  Map<SchemaPath, ColumnStatistics> columnsStatistics;
  Map<String, StatisticsHolder> statistics;
  List<String> partitionValues;
  List<String> locations;
  long lastModifiedTime;
}
{noformat}

h1. Segment metadata
In the fix for this Jira, one of the changes is introducing segment level 
metadata.

For now, metadata hierarchy is the following:
- Table
- Segment
- Partition
- File
- Row group

Segment represents some a part of the table united using some specific 
qualities. For example for file system tables, segment may correspond to 
directories with its data. For hive tables, segment corresponds to hive 
partitions.

In opposite, partition metadata, will correspond to "drill partitions". It is 
groups of data which have the same values for specific columns within a file or 
row group.

So filtering will be produced for table level, then for segments, after that 
for partitions, for files and then for row groups.

  was:
1. Merge info from metadataStatistics + statisticsKinds into one holder: 
Map<String, StatisticsHolder>.
2. Rename hasStatistics to hasDescriptiveStatistics
3. Remove drill-file-metastore-plugin
4. Move  
org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel 
to metadata module, rename to MetadataType and add new value: SEGMENT.
5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
6. Add new info classes:
{noformat}
class TableInfo {
  String storagePlugin;
  String workspace;
  String name;
  String type;
  String owner;
}

class MetadataInfo {

  public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
  public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";

  MetadataType type (enum);
  String key;
  String identifier;
}
{noformat}
7. Modify existing metadata classes:
org.apache.drill.metastore.FileTableMetadata
{noformat}
missing fields
------------------
storagePlugin, workspace, tableType -> will be covered by TableInfo class
metadataType, metadataKey -> will be covered by MetadataInfo class
interestingColumns

fields to modify
----------------
private final Map<String, Object> tableStatistics;
private final Map<String, StatisticsKind> statisticsKinds;
private final Set<String> partitionKeys; -> Map<String, String>
{noformat}

org.apache.drill.metastore.PartitionMetadata
{noformat}
missing fields
------------------
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo class
partitionValues (List<String>)
location (String) (for directory level metadata) - directory location

fields to modify
----------------
private final Map<String, Object> tableStatistics;
private final Map<String, StatisticsKind> statisticsKinds;
private final Set<Path> location; -> locations
{noformat}

org.apache.drill.metastore.FileMetadata
{noformat}
missing fields
------------------
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo class
path - path to file 

fields to modify
----------------
private final Map<String, Object> tableStatistics;
private final Map<String, StatisticsKind> statisticsKinds;
private final Path location; - should contain directory to which file belongs
{noformat}
org.apache.drill.metastore.RowGroupMetadata
{noformat}
missing fields
------------------
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo class
path - path to file 

fields to modify
----------------
private final Map<String, Object> tableStatistics;
private final Map<String, StatisticsKind> statisticsKinds;
private final Path location; - should contain directory to which file belongs
{noformat}
8. Remove org.apache.drill.exec package from metastore module.
9. Rename ColumnStatisticsImpl class.
10. Separate existing classes in org.apache.drill.metastore package into 
sub-packages.
11. Rename FileTableMetadata -> BaseTableMetadata
12. TableMetadataProvider.getNonInterestingColumnsMeta() -> 
getNonInterestingColumnsMetadata
13. Introduce segment-level metadata class:
{noformat}
class SegmentMetadata {
  TableInfo tableInfo;
  MetadataInfo metadataInfo;
  SchemaPath column;
  TupleMetadata schema;
  String location;
  Map<SchemaPath, ColumnStatistics> columnsStatistics;
  Map<String, StatisticsHolder> statistics;
  List<String> partitionValues;
  List<String> locations;
  long lastModifiedTime;
}
{noformat}


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> -------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-7271
>                 URL: https://issues.apache.org/jira/browse/DRILL-7271
>             Project: Apache Drill
>          Issue Type: Sub-task
>            Reporter: Arina Ielchiieva
>            Assignee: Volodymyr Vysotskyi
>            Priority: Major
>              Labels: ready-to-commit
>             Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map<String, StatisticsHolder>.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> ------------------
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> ----------------
> private final Map<String, Object> tableStatistics;
> private final Map<String, StatisticsKind> statisticsKinds;
> private final Set<String> partitionKeys; -> Map<String, String>
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> ------------------
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List<String>)
> location (String) (for directory level metadata) - directory location
> fields to modify
> ----------------
> private final Map<String, Object> tableStatistics;
> private final Map<String, StatisticsKind> statisticsKinds;
> private final Set<Path> location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> ------------------
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> ----------------
> private final Map<String, Object> tableStatistics;
> private final Map<String, StatisticsKind> statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> ------------------
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> ----------------
> private final Map<String, Object> tableStatistics;
> private final Map<String, StatisticsKind> statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> 
> getNonInterestingColumnsMetadata
> 13. Introduce segment-level metadata class:
> {noformat}
> class SegmentMetadata {
>   TableInfo tableInfo;
>   MetadataInfo metadataInfo;
>   SchemaPath column;
>   TupleMetadata schema;
>   String location;
>   Map<SchemaPath, ColumnStatistics> columnsStatistics;
>   Map<String, StatisticsHolder> statistics;
>   List<String> partitionValues;
>   List<String> locations;
>   long lastModifiedTime;
> }
> {noformat}
> h1. Segment metadata
> In the fix for this Jira, one of the changes is introducing segment level 
> metadata.
> For now, metadata hierarchy is the following:
> - Table
> - Segment
> - Partition
> - File
> - Row group
> Segment represents some a part of the table united using some specific 
> qualities. For example for file system tables, segment may correspond to 
> directories with its data. For hive tables, segment corresponds to hive 
> partitions.
> In opposite, partition metadata, will correspond to "drill partitions". It is 
> groups of data which have the same values for specific columns within a file 
> or row group.
> So filtering will be produced for table level, then for segments, after that 
> for partitions, for files and then for row groups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to