Billy Liu commented on KYLIN-3221:


> Some improvements for lookup table 
> -----------------------------------
>                 Key: KYLIN-3221
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3221
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Job Engine, Metadata, Query Engine
>            Reporter: Ma Gang
>            Assignee: Ma Gang
>            Priority: Major
> There are two limitations for current look table design:
>  # lookup table size is limited, because table snapshot need to be cached in 
> Kylin server, too large snapshot table will break the server.
>  # lookup table snapshot references are stored in all segments of the cube, 
> cannot support global snapshot table, the global snapshot table means when 
> the lookup table is updated, it will take effective for all segments.
> To resolve the above limitations, we decide to do some improvements for the 
> existing lookup table design, below is the initial document, any comments and 
> suggestions are welcome.
> h2. Metadata
> Will add a new property in CubeDesc to describe how lookup tables will be 
> snapshot, it can be defined during the cube design
> |{{@JsonProperty}}{{(}}{{"snapshot_table_desc_list"}}{{)}}
> {{private}} {{List<SnapshotTableDesc> snapshotTableDescList = 
> Collections.emptyList();}}|
>  SnapshotTableDesc defines how table is stored and whether it is global or 
> not, currently we can support two types of store:
>  # "metaStore",  table snapshot is stored in the metadata store, it is the 
> same as current design, and this is the default option.
>  # "hbaseStore', table snapshot is stored in an additional hbase table.
> |{{@JsonProperty}}{{(}}{{"table_name"}}{{)}}
> {{private}} {{String tableName;}}
> {{@JsonProperty}}{{(}}{{"store_type"}}{{)}}
> {{private}} {{String snapshotStorageType = }}{{"metaStore"}}{{;}}
> {{@JsonProperty}}{{(}}{{"global"}}{{)}}
> {{private}} {{boolean}} {{global = }}{{false}}{{;}}|
> Add 'snapshots' property in CubeInstance, to store snapshots resource path 
> for each table, when the table snapshot is set to global in cube design:
> |{{@JsonProperty}}{{(}}{{"snapshots"}}{{)}}
> {{private}} {{Map<String, String> snapshots; }}{{// tableName -> 
> tableResoucePath mapping}}|
> Add new meta model ExtTableSnapshot to describe the extended table snapshot 
> information, the information is stored in a new metastore path: 
> /ext_table_snapshot/\{tableName}/\{uuid}.snapshot, the metadata including 
> following info:
> |{{@JsonProperty}}{{(}}{{"tableName"}}{{)}}
> {{private}} {{String tableName;}}
> {{@JsonProperty}}{{(}}{{"signature"}}{{)}}
> {{private}} {{TableSignature signature;}}
> {{@JsonProperty}}{{(}}{{"storage_location_identifier"}}{{)}}
> {{private}} {{String storageLocationIdentifier;}}
> {{@JsonProperty}}{{(}}{{"size"}}{{)}}
> {{private}} {{long}} {{size;}}
> {{@JsonProperty}}{{(}}{{"row_cnt"}}{{)}}
> {{private}} {{long}} {{rowCnt;}}|
> Add new section in 'Advance Setting' tab when do cube design, user can set 
> table snapshot properties for each table, and by default, it is segment level 
> and store to metadata store
> h2. Build
> If user specify 'hbaseStore' storageType for any lookup table, will use 
> MapReduce job convert the hive source table to hfiles, and then bulk load 
> hfiles to HTable. So it will add two job steps to do the lookup table 
> materialization.
> h2. HBase Lookup Table Schema
> all data are stored in raw value
> suppose the lookup table has primary keys: key1,key2
> rowkey will be:
> ||2 bytes||len1 bytes||2 bytes||len2 bytes||
> |key1 value length(len1)|key1 value|key 2 value length(len2)|key2 value|
> 1 column family c, multiple columns which column name is the index of the 
> column in the table definition
> |c|
> |1|2|...|
> h2. Query
> For key lookup query, directly call hbase get api to get entire row according 
> to key.
> For queries that need fetch keys according to the derived columns, iterate 
> all rows to get related keys.
> For queries that only hit the lookup table, iterate all rows and let calcite 
> to do aggregation and filter.
> h2. Management
> For each lookup table, admin can view how many snapshots it has in Kylin, and 
> can view each snapshot type/size information and which cube/segments the 
> snapshot is referenced, the snapshot tables that have no reference can be 
> deleted.
> h2. Cleanup
> When clean up metadata store, need to remove snapshot stored in HBase. And 
> need to clean up metadata store periodically by cronjob.
> h2. Future
>  # Add coprocessor for lookup table, to improve the performance of lookup 
> table query, and queries that filter by derived columns.
>  # Add secondly index support for external snapshot table.

This message was sent by Atlassian JIRA

Reply via email to