[ 
https://issues.apache.org/jira/browse/HUDI-6077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6077:
---------------------------------
    Labels: pull-request-available  (was: )

> Add more partition push down filters
> ------------------------------------
>
>                 Key: HUDI-6077
>                 URL: https://issues.apache.org/jira/browse/HUDI-6077
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: Hui An
>            Priority: Major
>              Labels: pull-request-available
>
> 1. Implement some basic `Expression`s for HUDI
> 2. Try to convert all spark `Expression` to HUDI `Expression`
> 3. Implement `PartialBindVisitor` and `BindVistor` to bind values to HUDI 
> `Expression`
> 4. Currently, we only support push down `EqualTo` Filters if it's the first 
> level of partitions(by path prefix), this pr tries to pushdown more complex 
> partition filters(Like `And`, `Or`, `EqualTo` etc) when fetching partitions. 
> Through Parallel listing partition paths,  will use `PartialBindVisitor` to 
> bind partitions which are listed, and change the unresolved references to 
> `AlwaysTrue`.
> e.g.
> {code:java}
> Given the table has 3 partition levels: year, month, day. And the existing 
> table partition paths are:
> year=2023/month=2/day=11
> year=2023/month=2/day=12
> year=2024/month=2/day=12
> If we want to push down the filter `year=2023 AND day=12`, When listing the 
> partition first level `year`, will bind schema `year` to `PartialBindVisitor`.
> Since `day` is not provided, the filter will be modified to `year=2023 AND 
> TRUE`(optimized to `year=2023`), so the first 2 paths will be selected.
> Then starts to parallel listing first 2 paths, since the day is still not 
> provided, these 2 paths still are selected.
> And finally listing the last partition level, the filter `year=2023 AND 
> day=12` will be used and return `year=2023/month=2/day=12`
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to