Hi JianLiang,

Looking forward to your detailed design and PR.

Ling Miao

ye qi <jianliang5...@gmail.com> 于2021年2月22日周一 下午7:25写道:

> Hi, Ling Miao.
>
> Thanks for your advice.
> I'll think about it and get back to you.
>
> Jianliang Qi
>
> On Mon, Feb 22, 2021 at 5:31 PM ling miao <lingm...@apache.org> wrote:
>
>> Hi JianLiang,
>>
>> Thank you for your proposal, I think this function is still necessary for
>> some large dimension tables.
>> This means that data that is not generated according to time can also be
>> partitioned.
>>
>> Of course, since this is a change to metadata, all loads, queries, and
>> other DDL operations may need to be changed and developed.
>> Please be considerate when designing.
>>
>> Ling Miao
>>
>> ye qi <jianliang5...@gmail.com> 于2021年2月21日周日 上午1:12写道:
>>
>>> List partition
>>>
>>> Doris currently only supports Range partitioning, where data is usually
>>> partitioned by time columns.
>>>
>>> However, in some scenarios, users want to partition by some enumerated
>>> values of columns, such as by city, etc.
>>> Design
>>>
>>> To add support for List partitioning, the following functional points
>>> need
>>> to be considered.
>>>
>>>    1. Support for List partition syntax in creating table statements.
>>>    2. Support for adding and deleting List partition syntax.
>>>    3. Support for List partitioning in various load operations.
>>>    4. Support for List partition pruning during query.
>>>
>>> List partitioned tables do not need to consider dynamic partitioning.
>>> Detailed designSyntax
>>>
>>> The main changes involved here include.
>>>
>>>    1. Implementation of the subclass ListPartitionDesc of the parsing
>>> class
>>>    PartitionDesc
>>>    2. Implementation of metadata class PartitionInfo subclass
>>>    ListPartitionInfo
>>>    3. Support for parsing and checking ListPartitionDesc in
>>> CreateTableStmt
>>>    4. Support for the creation of List Partition tables in Catalog class.
>>>    5. Metadata persistence-related changes.
>>>
>>> The syntax is referenced from MySQL and Oracle
>>> Single partition column
>>>
>>> CREATE TABLE tb1 (
>>>     k1 int, k2 varchar(128), k3 int, v1 int, v2 int
>>> )
>>> PARTITION BY LIST(k1)
>>> (
>>>     PARTITION p1 VALUES IN ("1", "3", "5"),
>>>     PARTITION p2 VALUES IN ("2", "4", "6"),
>>>     ...
>>> )
>>> ...
>>> ;
>>>
>>> Multi-partition columns
>>>
>>> CREATE TABLE tb2 (
>>>     k1 int, k2 varchar(128), k3 int, v1 int, v2 int
>>> )
>>> PARTITION BY LIST(k1, k2)
>>> (
>>>     PARTITION p1 VALUES IN (("1", "beijing"), ("1", "shanghai")),
>>>     PARTITION p2 VALUES IN (("2", "beijing"), ("2", "shanghai"), ("2",
>>> "tianjin")),
>>>     PARTITION p3 VALUES IN (("3", "beijing")),
>>>     ...
>>> )
>>> ...
>>> ;
>>>
>>> NOTE: Each partition needs to ensure that the partition values are
>>> unique.
>>> Add partition
>>>
>>> ALTER TABLE tb1 ADD PARTITION p4 VALUES IN ("7", "8", "9");
>>> ALTER TABLE tb2 ADD PARTITION p4 VALUES IN (("4", "tianjin"));
>>>
>>> Load
>>>
>>> The current load methods of Doris include Stream Load, INSERT, Routine
>>> Load, Broker Load, Hadoop Load, Spark Load.
>>>
>>> Among them, Stream Load, INSERT, Routine Load, and Broker Load all use
>>> TabletSink class for data distribution. Our first phase supports List
>>> partition support for these load operations.
>>>
>>> The main changes involved include:
>>>
>>>    1. Changes related to the Descriptors.TOlapTablePartitionParam
>>> structure
>>>    in the Thrift structure TOlapTableSink
>>>    2. Changes related to the OlapTablePartition object in the
>>> OlapTableSink
>>>    class on the BE side.
>>>
>>> Query
>>>
>>> The query mainly needs to implement the List Partition pruning function.
>>>
>>> The main changes involved include:
>>>
>>>    1. Implementing the subclass ListPartitionPruner of PartitionPruner
>>>
>>> Partition related
>>>
>>> Support operations related to partitioned tables, such as recover,
>>> truncate, temporary partition, restore, replace, etc.
>>>
>>

Reply via email to