wangshuo128 opened a new pull request #7434:
URL: https://github.com/apache/incubator-doris/pull/7434
## Proposed changes
For #7433
This PR proposes to implement a V2 version of partition prune algorithm. We
use session variable `partition_prune_algorithm_version` as the control flag,
with default value 1.
## Design notes
### Introduce `ColumnRange` to represent all the predicates for a column.
It's extension of the current `PartitionColumnFilter`.
There are two kinds of predicates for a column: `is null` predicate and
other predicates that the value of a column is not null, e.g., `col=1`,
`col>2`, `col in (1,2,3)`, etc.
This can represent both conjunctive and disjunctive predicates for a column.
The meaning of the predicates is: `conjunctiveIsNull` AND (`rangeSet` OR
`disjunctiveIsNull`).
### For single column partition, unify the logic for both list and range
partition prune.
1. Convert partition keys to `ColumnRange` for every partition, get a
candidate RangeMap `candidate`. The key of the `candidate` is partitions'
column range, and the value of the `candidate` range map is partition ID.
2. Apply the `ColumnRange` of all the predicates to the `candidate` to prune
partitions.
### For **multiple-column partition**, list partition prune is improved and
range partition prune is just like v1 version.
For multiple-column list partitions, it's a little different from the logic
of single column partition.
Firstly, we group partition ranges by the range of each column, to compare
with the filters.
Then we apply the ranges in predicates to prune partitions like what we do
for the single-column partition.
For **multiple-column range partition**, it's a little complex to unify the
logic of pruning multiple columns partition with list range partitions.
The key point is that the list partition's values are the explicit values of
partition columns, however, the range-bound for a partition column in multiple
columns partition is dependent on both other partition columns' range values
and the range value itself:
Let's say we have two partition columns k1, k2:
For partition [(1, 5), (1, 10)), the range for k2 is [5, 10).
For partition [(1, 5), (2, 10)), the range for k2 is (-∞, +∞).
For partition [(1, 10), (2, 5)], the range for k2 is (-∞, 5) union [10, +∞).
We could try to compute the range-bound of every column in multiple columns
partition and unify the logic like pruning multiple list columns partition for
multiple range ones in the future.
## Types of changes
What types of changes does your code introduce to Doris?
_Put an `x` in the boxes that apply_
- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [ ] Documentation Update (if none of the other choices apply)
- [ ] Code refactor (Modify the code structure, format the code, etc...)
- [ ] Optimization. Including functional usability improvements and
performance improvements.
- [ ] Dependency. Such as changes related to third-party components.
- [ ] Other.
## Checklist
_Put an `x` in the boxes that apply. You can also fill these out after
creating the PR. If you're unsure about any of them, don't hesitate to ask.
We're here to help! This is simply a reminder of what we are going to look for
before merging your code._
- [ ] I have created an issue on (Fix #ISSUE) and described the bug/feature
there in detail
- [ ] Compiling and unit tests pass locally with my changes
- [ ] I have added tests that prove my fix is effective or that my feature
works
- [ ] If these changes need document changes, I have updated the document
- [ ] Any dependent changes have been merged
## Further comments
If this is a relatively large or complex change, kick off the discussion at
[email protected] by explaining why you chose the solution you did and what
alternatives you considered, etc...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]