[
https://issues.apache.org/jira/browse/HUDI-6077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hui An updated HUDI-6077:
-------------------------
Description:
1. Implement some basic `Expression`s for HUDI
2. Try to convert all spark `Expression` to HUDI `Expression`
3. Implement `PartialBindVisitor` and `BindVistor` to bind values to HUDI
`Expression`
4. Currently, we only support push down `EqualTo` Filters if it's the first
level of partitions(by path prefix), this pr tries to pushdown more complex
partition filters(Like `And`, `Or`, `EqualTo` etc) when fetching partitions.
Through Parallel listing partition paths, will use `PartialBindVisitor` to bind
partitions which are listed, and change the unresolved references to
`AlwaysTrue`.
e.g.
{code:java}
Given the table has 3 partition levels: year, month, day. And the existing
table partition paths are:
year=2023/month=2/day=11
year=2023/month=2/day=12
year=2024/month=2/day=12
If we want to push down the filter `year=2023 AND day=12`, When listing the
partition first level `year`, will bind schema `year` to `PartialBindVisitor`.
Since `day` is not provided, the filter will be modified to `year=2023 AND
TRUE`(optimized to `year=2023`), so the first 2 paths will be selected.
Then starts to parallel listing first 2 paths, since the day is still not
provided, these 2 paths still are selected.
And finally listing the last partition level, the filter `year=2023 AND day=12`
will be used and return `year=2023/month=2/day=12`
{code}
Because of
[discussion|https://github.com/apache/hudi/pull/8452#discussion_r1191971668]
[discussion|https://github.com/apache/hudi/pull/8452#discussion_r1199997268],
this feature would take effect if:
1. For {*}FileSystemBackedTableMetadata{*}:
{{hoodie.datasource.write.hive_style_partitioning}} and
{{hoodie.datasource.write.partitionpath.urlencode}} are both enabled
2. For {*}HoodieTableBackedTableMetadata{*}({{hoodie.metadata.enable}} is
true): only {{hoodie.datasource.write.hive_style_partitioning}} is enabled
was:
1. Implement some basic `Expression`s for HUDI
2. Try to convert all spark `Expression` to HUDI `Expression`
3. Implement `PartialBindVisitor` and `BindVistor` to bind values to HUDI
`Expression`
4. Currently, we only support push down `EqualTo` Filters if it's the first
level of partitions(by path prefix), this pr tries to pushdown more complex
partition filters(Like `And`, `Or`, `EqualTo` etc) when fetching partitions.
Through Parallel listing partition paths, will use `PartialBindVisitor` to bind
partitions which are listed, and change the unresolved references to
`AlwaysTrue`.
e.g.
{code:java}
Given the table has 3 partition levels: year, month, day. And the existing
table partition paths are:
year=2023/month=2/day=11
year=2023/month=2/day=12
year=2024/month=2/day=12
If we want to push down the filter `year=2023 AND day=12`, When listing the
partition first level `year`, will bind schema `year` to `PartialBindVisitor`.
Since `day` is not provided, the filter will be modified to `year=2023 AND
TRUE`(optimized to `year=2023`), so the first 2 paths will be selected.
Then starts to parallel listing first 2 paths, since the day is still not
provided, these 2 paths still are selected.
And finally listing the last partition level, the filter `year=2023 AND day=12`
will be used and return `year=2023/month=2/day=12`
{code}
Because of
[discussion|https://github.com/apache/hudi/pull/8452#discussion_r1191971668],
this feature would take effect if:
1. For {*}FileSystemBackedTableMetadata{*}:
{{hoodie.datasource.write.hive_style_partitioning}} and
{{hoodie.datasource.write.partitionpath.urlencode}} are both enabled
2. For {*}HoodieTableBackedTableMetadata{*}({{hoodie.metadata.enable}} is
true): only {{hoodie.datasource.write.hive_style_partitioning}} is enabled
> Add more partition push down filters
> ------------------------------------
>
> Key: HUDI-6077
> URL: https://issues.apache.org/jira/browse/HUDI-6077
> Project: Apache Hudi
> Issue Type: Improvement
> Reporter: Hui An
> Priority: Major
> Labels: pull-request-available
>
> 1. Implement some basic `Expression`s for HUDI
> 2. Try to convert all spark `Expression` to HUDI `Expression`
> 3. Implement `PartialBindVisitor` and `BindVistor` to bind values to HUDI
> `Expression`
> 4. Currently, we only support push down `EqualTo` Filters if it's the first
> level of partitions(by path prefix), this pr tries to pushdown more complex
> partition filters(Like `And`, `Or`, `EqualTo` etc) when fetching partitions.
> Through Parallel listing partition paths, will use `PartialBindVisitor` to
> bind partitions which are listed, and change the unresolved references to
> `AlwaysTrue`.
> e.g.
> {code:java}
> Given the table has 3 partition levels: year, month, day. And the existing
> table partition paths are:
> year=2023/month=2/day=11
> year=2023/month=2/day=12
> year=2024/month=2/day=12
> If we want to push down the filter `year=2023 AND day=12`, When listing the
> partition first level `year`, will bind schema `year` to `PartialBindVisitor`.
> Since `day` is not provided, the filter will be modified to `year=2023 AND
> TRUE`(optimized to `year=2023`), so the first 2 paths will be selected.
> Then starts to parallel listing first 2 paths, since the day is still not
> provided, these 2 paths still are selected.
> And finally listing the last partition level, the filter `year=2023 AND
> day=12` will be used and return `year=2023/month=2/day=12`
> {code}
> Because of
> [discussion|https://github.com/apache/hudi/pull/8452#discussion_r1191971668]
> [discussion|https://github.com/apache/hudi/pull/8452#discussion_r1199997268],
> this feature would take effect if:
> 1. For {*}FileSystemBackedTableMetadata{*}:
> {{hoodie.datasource.write.hive_style_partitioning}} and
> {{hoodie.datasource.write.partitionpath.urlencode}} are both enabled
> 2. For {*}HoodieTableBackedTableMetadata{*}({{hoodie.metadata.enable}} is
> true): only {{hoodie.datasource.write.hive_style_partitioning}} is enabled
--
This message was sent by Atlassian Jira
(v8.20.10#820010)