Hi JianLiang, Looking forward to your detailed design and PR.
Ling Miao ye qi <jianliang5...@gmail.com> 于2021年2月22日周一 下午7:25写道: > Hi, Ling Miao. > > Thanks for your advice. > I'll think about it and get back to you. > > Jianliang Qi > > On Mon, Feb 22, 2021 at 5:31 PM ling miao <lingm...@apache.org> wrote: > >> Hi JianLiang, >> >> Thank you for your proposal, I think this function is still necessary for >> some large dimension tables. >> This means that data that is not generated according to time can also be >> partitioned. >> >> Of course, since this is a change to metadata, all loads, queries, and >> other DDL operations may need to be changed and developed. >> Please be considerate when designing. >> >> Ling Miao >> >> ye qi <jianliang5...@gmail.com> 于2021年2月21日周日 上午1:12写道: >> >>> List partition >>> >>> Doris currently only supports Range partitioning, where data is usually >>> partitioned by time columns. >>> >>> However, in some scenarios, users want to partition by some enumerated >>> values of columns, such as by city, etc. >>> Design >>> >>> To add support for List partitioning, the following functional points >>> need >>> to be considered. >>> >>> 1. Support for List partition syntax in creating table statements. >>> 2. Support for adding and deleting List partition syntax. >>> 3. Support for List partitioning in various load operations. >>> 4. Support for List partition pruning during query. >>> >>> List partitioned tables do not need to consider dynamic partitioning. >>> Detailed designSyntax >>> >>> The main changes involved here include. >>> >>> 1. Implementation of the subclass ListPartitionDesc of the parsing >>> class >>> PartitionDesc >>> 2. Implementation of metadata class PartitionInfo subclass >>> ListPartitionInfo >>> 3. Support for parsing and checking ListPartitionDesc in >>> CreateTableStmt >>> 4. Support for the creation of List Partition tables in Catalog class. >>> 5. Metadata persistence-related changes. >>> >>> The syntax is referenced from MySQL and Oracle >>> Single partition column >>> >>> CREATE TABLE tb1 ( >>> k1 int, k2 varchar(128), k3 int, v1 int, v2 int >>> ) >>> PARTITION BY LIST(k1) >>> ( >>> PARTITION p1 VALUES IN ("1", "3", "5"), >>> PARTITION p2 VALUES IN ("2", "4", "6"), >>> ... >>> ) >>> ... >>> ; >>> >>> Multi-partition columns >>> >>> CREATE TABLE tb2 ( >>> k1 int, k2 varchar(128), k3 int, v1 int, v2 int >>> ) >>> PARTITION BY LIST(k1, k2) >>> ( >>> PARTITION p1 VALUES IN (("1", "beijing"), ("1", "shanghai")), >>> PARTITION p2 VALUES IN (("2", "beijing"), ("2", "shanghai"), ("2", >>> "tianjin")), >>> PARTITION p3 VALUES IN (("3", "beijing")), >>> ... >>> ) >>> ... >>> ; >>> >>> NOTE: Each partition needs to ensure that the partition values are >>> unique. >>> Add partition >>> >>> ALTER TABLE tb1 ADD PARTITION p4 VALUES IN ("7", "8", "9"); >>> ALTER TABLE tb2 ADD PARTITION p4 VALUES IN (("4", "tianjin")); >>> >>> Load >>> >>> The current load methods of Doris include Stream Load, INSERT, Routine >>> Load, Broker Load, Hadoop Load, Spark Load. >>> >>> Among them, Stream Load, INSERT, Routine Load, and Broker Load all use >>> TabletSink class for data distribution. Our first phase supports List >>> partition support for these load operations. >>> >>> The main changes involved include: >>> >>> 1. Changes related to the Descriptors.TOlapTablePartitionParam >>> structure >>> in the Thrift structure TOlapTableSink >>> 2. Changes related to the OlapTablePartition object in the >>> OlapTableSink >>> class on the BE side. >>> >>> Query >>> >>> The query mainly needs to implement the List Partition pruning function. >>> >>> The main changes involved include: >>> >>> 1. Implementing the subclass ListPartitionPruner of PartitionPruner >>> >>> Partition related >>> >>> Support operations related to partitioned tables, such as recover, >>> truncate, temporary partition, restore, replace, etc. >>> >>