qidaye opened a new issue #5402:
URL: https://github.com/apache/incubator-doris/issues/5402


   # List parttion
   
   Doris currently only supports Range partitioning, where data is usually 
partitioned by time columns. 
   
   However, in some scenarios, users want to partition by some enumerated 
values of columns, such as by city, etc.
   
   # Design 
   
   To add support for List partitioning, the following functional points need 
to be considered.
   
   1. Support for List partition syntax in create table statements.
   2. Support for adding and deleting List partition syntax. 
   3. Support for List partitioning in various load operations.
   4. Support for List partition pruning during query.
   
   List partitioned tables do not need to consider dynamic partitioning.
   
   # Detailed design
   
   ## Syntax
   
   The main changes involved here include.
   
   1. Implementation of the subclass ListPartitionDesc of the parsing class 
PartitionDesc
   2. Implementation of metadata class PartitionInfo subclass ListPartitionInfo
   3. Support for parsing and checking ListPartitionDesc in CreateTableStmt
   4. Support for the creation of List Partition tables in Catalog class. 
   5. Metadata persistence-related changes.
   
   The syntax is referenced from MySQL and Oracle
   
   ### Single partition column 
   
     ```sql
     CREATE TABLE tb1 (
         k1 int, k2 varchar(128), k3 int, v1 int, v2 int
     )
     PARTITION BY LIST(k1)
     (
         PARTITION p1 VALUES IN ("1", "3", "5"),
         PARTITION p2 VALUES IN ("2", "4", "6"),
         ...
     )
     ...
     ;
     ```
   
   ### Multi-partition columns
   
   ```sql
   CREATE TABLE tb2 (
       k1 int, k2 varchar(128), k3 int, v1 int, v2 int
   )
   PARTITION BY LIST(k1, k2)
   (
       PARTITION p1 VALUES IN (("1", "beijing"), ("1", "shanghai")),
       PARTITION p2 VALUES IN (("2", "beijing"), ("2", "shanghai"), ("2", 
"tianjin")),
       PARTITION p3 VALUES IN (("3", "beijing")),
       ...
   )
   ...
   ;
   ``` 
   
   **NOTE**: Each partition needs to ensure that the partition values are 
unique.
   
   ### Add partition 
   
   ```sql
   ALTER TABLE tb1 ADD PARTITION p4 VALUES IN ("7", "8", "9");
   
   ALTER TABLE tb2 ADD PARTITION p4 VALUES IN (("4", "tianjin"));
   ```
   
   ## Load
   
   The current load methods of Doris include Stream Load, INSERT, Routine Load, 
Broker Load, Hadoop Load, Spark Load.
   
   Among them, Stream Load, INSERT, Routine Load, and Broker Load all use 
TabletSink class for data distribution. Our first phase supports List partition 
support for these load operations.
   
   The main changes involved include: 
   
   1. Changes related to the Descriptors.TOlapTablePartitionParam structure in 
the Thrift structure TOlapTableSink
   2. Changes related to the OlapTablePartition object in the OlapTableSink 
class on the BE side.
   
   ## Query
   
   The query mainly needs to implement the List Partition pruning function.
   
   The main changes involved include: 
   
   1. Implementing the subclass ListPartitionPruner of PartitionPruner
   
   ## Partition related
   
   Support operations related to partitioned tables, such as recover, truncate, 
temporary partition, restore, replace, etc.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to