Hui An created HUDI-6077:
----------------------------
Summary: Add more partition push down filters
Key: HUDI-6077
URL: https://issues.apache.org/jira/browse/HUDI-6077
Project: Apache Hudi
Issue Type: Improvement
Reporter: Hui An
1. Implement some basic `Expression`s for HUDI
2. Try to convert all spark `Expression` to HUDI `Expression`
3. Implement `PartialBindVisitor` and `BindVistor` to bind values to HUDI
`Expression`
4. Currently, we only support push down `EqualTo` Filters if it's the first
level of partitions(by path prefix), this pr tries to pushdown more complex
partition filters(Like `And`, `Or`, `EqualTo` etc) when fetching partitions.
Through Parallel listing partition paths, will use `PartialBindVisitor` to
bind partitions which are listed, and change the unresolved references to
`AlwaysTrue`.
e.g.
{code:java}
Given the table has 3 partition levels: year, month, day. And the existing
table partition paths are:
year=2023/month=2/day=11
year=2023/month=2/day=12
year=2024/month=2/day=12
If we want to push down the filter `year=2023 AND day=12`, When listing the
partition first level `year`, will bind schema `year` to `PartialBindVisitor`.
Since `day` is not provided, the filter will be modified to `year=2023 AND
TRUE`(optimized to `year=2023`), so the first 2 paths will be selected.
Then starts to parallel listing first 2 paths, since the day is still not
provided, these 2 paths still are selected.
And finally listing the last partition level, the filter `year=2023 AND day=12`
will be used and return `year=2023/month=2/day=12`
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)