924060929 opened a new pull request, #18003:
URL: https://github.com/apache/doris/pull/18003
# Proposed changes
the legacy PartitionPruner only support some simple cases, some useful cases
not support:
1. can not support evaluate some builtin functions, like `cast(part_column
as bigint) = 1`
2. can not prune multi level range partition, for partition `[[('1', 'a'),
('2', 'b'))`, we can known some cases:
- first_part_column between '1' and '2'
- if first_part_column = '1' then second_part_column >= 'a'
- if first_part_column = '2' then second_part_column < 'a'
This pr refactor it and support:
1. use Visitor to evaluate function and fold constant
2. if the partition is discrete like int, date, we can expand it and
evaluate, e.g `[1, 5)` will be expand to `[1, 2, 3, 4]`
3. support prune multi level range partition, as previously described
4. support evaluate capabilities for a range slot, e.g. datetime range
partition `[('2023-03-21 00:00:00'), ('2023-03-21 23:59:59'))`, if the filter
is `date(col1) = '2023-03-22'`, this partition will be pruned, we can do this
prune because we known that the date always is `2023-03-21`. you can implement
the visit function in FoldConstantRuleOnFE and OneRangePartitionEvaluator to
support this functions.
### Why can we do it so finely ?
Generally, the range partition can separate to three parts: `const`,
`range`, `other`.
for example, the partition `[(1, 'a', 'D'), ('1', 'c', 'D'))` exist
1. first partition column is `const`: always equals to '1'
2. second partition column is `range`: `slot >= 'a' and <= 'c'`. If not
later slot, it must be `slot >= 'a' and < 'c'`
3. third partition column is `other`: regardless of whether the upper and
lower bounds are the same, it must exist multi values, e.g. `('1', 'a', 'D')`,
`('1', 'a', 'F')`, `('1', 'b', 'A')`, `('1', 'c', 'A')`
The properties of `const`:
1. we can replace slot to literal to evaluate expression tree.
The properties of `range`:
1. if the slot date type is discrete type, like int, and date, we can expand
it to literal and evaluate expression tree
2. if not discrete type, like datetime, or the discrete values too much,
like [1, 1000000), we can keep the slot in the expression tree, and assign a
range for it, when evaluate expression tree, we also compute the range and
check whether range is empty set, if so we can simplify to BooleanLiteral.FALSE
to skip this partition.
3. if the range slot satisfied some conditions , we can fold the slot with
some function too, see the datetime example above
The properties of `other`:
1. only when the previous slot is literal and equals to the lower bound or
upper bound of partition, we can determinate shrink the range of the `other`
slot
## Checklist(Required)
* [ ] Does it affect the original behavior
* [x] Has unit tests been added
* [ ] Has document been added or modified
* [ ] Does it need to update dependencies
* [ ] Is this PR support rollback (If NO, please explain WHY)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]