wangshuo128 opened a new pull request #7434:
URL: https://github.com/apache/incubator-doris/pull/7434


   ## Proposed changes
   For #7433
   This PR proposes to implement a V2 version of partition prune algorithm.  We 
use session variable `partition_prune_algorithm_version`  as the control flag, 
with default value 1.
   
   ## Design notes
   ### Introduce `ColumnRange` to represent all the predicates for a column. 
It's extension of the current `PartitionColumnFilter`.
   
   There are two kinds of predicates for a column: `is null` predicate and 
other predicates that the value of a column is not null, e.g., `col=1`, 
`col>2`, `col in (1,2,3)`, etc.
   This can represent both conjunctive and disjunctive predicates for a column.
   The meaning of the predicates is: `conjunctiveIsNull` AND (`rangeSet` OR 
`disjunctiveIsNull`).
   
   ### For single column partition, unify the logic for both list and range 
partition prune. 
   1. Convert partition keys to `ColumnRange` for every partition, get a 
candidate RangeMap `candidate`. The key of the `candidate` is partitions' 
column range, and the value of the `candidate` range map is partition ID. 
   2. Apply the `ColumnRange` of all the predicates to the `candidate` to prune 
partitions. 
   
   ### For **multiple-column partition**, list partition prune is improved and 
range partition prune is just like v1 version.
   For multiple-column list partitions, it's a little different from the logic 
of single column partition.
   Firstly, we group partition ranges by the range of each column, to compare 
with the filters. 
   Then we apply the ranges in predicates to prune partitions like what we do 
for the single-column partition. 
   
   For **multiple-column range partition**,  it's a little complex to unify the 
logic of pruning multiple columns partition with list range partitions.
      
   The key point is that the list partition's values are the explicit values of 
partition columns, however, the range-bound for a partition column in multiple 
columns partition is dependent on both other partition columns' range values 
and the range value itself:
   
   Let's say we have two partition columns k1, k2:
   For partition [(1, 5), (1, 10)), the range for k2 is [5, 10).
   For partition [(1, 5), (2, 10)), the range for k2 is (-∞, +∞).
   For partition [(1, 10), (2, 5)], the range for k2 is (-∞, 5) union [10, +∞).
    
   We could try to compute the range-bound of every column in multiple columns 
partition and unify the logic like pruning multiple list columns partition for 
multiple range ones in the future.
   
   ## Types of changes
   
   What types of changes does your code introduce to Doris?
   _Put an `x` in the boxes that apply_
   
   - [ ] Bugfix (non-breaking change which fixes an issue)
   - [ ] New feature (non-breaking change which adds functionality)
   - [ ] Breaking change (fix or feature that would cause existing 
functionality to not work as expected)
   - [ ] Documentation Update (if none of the other choices apply)
   - [ ] Code refactor (Modify the code structure, format the code, etc...)
   - [ ] Optimization. Including functional usability improvements and 
performance improvements.
   - [ ] Dependency. Such as changes related to third-party components.
   - [ ] Other.
   
   ## Checklist
   
   _Put an `x` in the boxes that apply. You can also fill these out after 
creating the PR. If you're unsure about any of them, don't hesitate to ask. 
We're here to help! This is simply a reminder of what we are going to look for 
before merging your code._
   
   - [ ] I have created an issue on (Fix #ISSUE) and described the bug/feature 
there in detail
   - [ ] Compiling and unit tests pass locally with my changes
   - [ ] I have added tests that prove my fix is effective or that my feature 
works
   - [ ] If these changes need document changes, I have updated the document
   - [ ] Any dependent changes have been merged
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at 
[email protected] by explaining why you chose the solution you did and what 
alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to