[ 
https://issues.apache.org/jira/browse/HUDI-6077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui An updated HUDI-6077:
-------------------------
    Description: 
1. Implement some basic `Expression`s for HUDI
2. Try to convert all spark `Expression` to HUDI `Expression`
3. Implement `PartialBindVisitor` and `BindVistor` to bind values to HUDI 
`Expression`
4. Currently, we only support push down `EqualTo` Filters if it's the first 
level of partitions(by path prefix), this pr tries to pushdown more complex 
partition filters(Like `And`, `Or`, `EqualTo` etc) when fetching partitions. 
Through Parallel listing partition paths, will use `PartialBindVisitor` to bind 
partitions which are listed, and change the unresolved references to 
`AlwaysTrue`.
e.g.
{code:java}
Given the table has 3 partition levels: year, month, day. And the existing 
table partition paths are:
year=2023/month=2/day=11
year=2023/month=2/day=12
year=2024/month=2/day=12
If we want to push down the filter `year=2023 AND day=12`, When listing the 
partition first level `year`, will bind schema `year` to `PartialBindVisitor`.
Since `day` is not provided, the filter will be modified to `year=2023 AND 
TRUE`(optimized to `year=2023`), so the first 2 paths will be selected.
Then starts to parallel listing first 2 paths, since the day is still not 
provided, these 2 paths still are selected.
And finally listing the last partition level, the filter `year=2023 AND day=12` 
will be used and return `year=2023/month=2/day=12`
{code}
Because of 
[discussion|https://github.com/apache/hudi/pull/8452#discussion_r1191971668], 
this feature would take effect if:

1. For {*}FileSystemBackedTableMetadata{*}: 
{{hoodie.datasource.write.hive_style_partitioning}} and 
{{hoodie.datasource.write.partitionpath.urlencode}} are both enabled
2. For {*}HoodieTableBackedTableMetadata{*}({{hoodie.metadata.enable}} is 
true): only {{hoodie.datasource.write.hive_style_partitioning}} is enabled

  was:
1. Implement some basic `Expression`s for HUDI
2. Try to convert all spark `Expression` to HUDI `Expression`
3. Implement `PartialBindVisitor` and `BindVistor` to bind values to HUDI 
`Expression`
4. Currently, we only support push down `EqualTo` Filters if it's the first 
level of partitions(by path prefix), this pr tries to pushdown more complex 
partition filters(Like `And`, `Or`, `EqualTo` etc) when fetching partitions. 
Through Parallel listing partition paths,  will use `PartialBindVisitor` to 
bind partitions which are listed, and change the unresolved references to 
`AlwaysTrue`.
e.g.


{code:java}
Given the table has 3 partition levels: year, month, day. And the existing 
table partition paths are:
year=2023/month=2/day=11
year=2023/month=2/day=12
year=2024/month=2/day=12
If we want to push down the filter `year=2023 AND day=12`, When listing the 
partition first level `year`, will bind schema `year` to `PartialBindVisitor`.
Since `day` is not provided, the filter will be modified to `year=2023 AND 
TRUE`(optimized to `year=2023`), so the first 2 paths will be selected.
Then starts to parallel listing first 2 paths, since the day is still not 
provided, these 2 paths still are selected.
And finally listing the last partition level, the filter `year=2023 AND day=12` 
will be used and return `year=2023/month=2/day=12`
{code}


> Add more partition push down filters
> ------------------------------------
>
>                 Key: HUDI-6077
>                 URL: https://issues.apache.org/jira/browse/HUDI-6077
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: Hui An
>            Priority: Major
>              Labels: pull-request-available
>
> 1. Implement some basic `Expression`s for HUDI
> 2. Try to convert all spark `Expression` to HUDI `Expression`
> 3. Implement `PartialBindVisitor` and `BindVistor` to bind values to HUDI 
> `Expression`
> 4. Currently, we only support push down `EqualTo` Filters if it's the first 
> level of partitions(by path prefix), this pr tries to pushdown more complex 
> partition filters(Like `And`, `Or`, `EqualTo` etc) when fetching partitions. 
> Through Parallel listing partition paths, will use `PartialBindVisitor` to 
> bind partitions which are listed, and change the unresolved references to 
> `AlwaysTrue`.
> e.g.
> {code:java}
> Given the table has 3 partition levels: year, month, day. And the existing 
> table partition paths are:
> year=2023/month=2/day=11
> year=2023/month=2/day=12
> year=2024/month=2/day=12
> If we want to push down the filter `year=2023 AND day=12`, When listing the 
> partition first level `year`, will bind schema `year` to `PartialBindVisitor`.
> Since `day` is not provided, the filter will be modified to `year=2023 AND 
> TRUE`(optimized to `year=2023`), so the first 2 paths will be selected.
> Then starts to parallel listing first 2 paths, since the day is still not 
> provided, these 2 paths still are selected.
> And finally listing the last partition level, the filter `year=2023 AND 
> day=12` will be used and return `year=2023/month=2/day=12`
> {code}
> Because of 
> [discussion|https://github.com/apache/hudi/pull/8452#discussion_r1191971668], 
> this feature would take effect if:
> 1. For {*}FileSystemBackedTableMetadata{*}: 
> {{hoodie.datasource.write.hive_style_partitioning}} and 
> {{hoodie.datasource.write.partitionpath.urlencode}} are both enabled
> 2. For {*}HoodieTableBackedTableMetadata{*}({{hoodie.metadata.enable}} is 
> true): only {{hoodie.datasource.write.hive_style_partitioning}} is enabled



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to