[
https://issues.apache.org/jira/browse/HUDI-6077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HUDI-6077:
---------------------------------
Labels: pull-request-available (was: )
> Add more partition push down filters
> ------------------------------------
>
> Key: HUDI-6077
> URL: https://issues.apache.org/jira/browse/HUDI-6077
> Project: Apache Hudi
> Issue Type: Improvement
> Reporter: Hui An
> Priority: Major
> Labels: pull-request-available
>
> 1. Implement some basic `Expression`s for HUDI
> 2. Try to convert all spark `Expression` to HUDI `Expression`
> 3. Implement `PartialBindVisitor` and `BindVistor` to bind values to HUDI
> `Expression`
> 4. Currently, we only support push down `EqualTo` Filters if it's the first
> level of partitions(by path prefix), this pr tries to pushdown more complex
> partition filters(Like `And`, `Or`, `EqualTo` etc) when fetching partitions.
> Through Parallel listing partition paths, will use `PartialBindVisitor` to
> bind partitions which are listed, and change the unresolved references to
> `AlwaysTrue`.
> e.g.
> {code:java}
> Given the table has 3 partition levels: year, month, day. And the existing
> table partition paths are:
> year=2023/month=2/day=11
> year=2023/month=2/day=12
> year=2024/month=2/day=12
> If we want to push down the filter `year=2023 AND day=12`, When listing the
> partition first level `year`, will bind schema `year` to `PartialBindVisitor`.
> Since `day` is not provided, the filter will be modified to `year=2023 AND
> TRUE`(optimized to `year=2023`), so the first 2 paths will be selected.
> Then starts to parallel listing first 2 paths, since the day is still not
> provided, these 2 paths still are selected.
> And finally listing the last partition level, the filter `year=2023 AND
> day=12` will be used and return `year=2023/month=2/day=12`
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)