hudi-bot opened a new issue, #15898:
URL: https://github.com/apache/hudi/issues/15898
1. Implement some basic `Expression`s for HUDI
2. Try to convert all spark `Expression` to HUDI `Expression`
3. Implement `PartialBindVisitor` and `BindVistor` to bind values to HUDI
`Expression`
4. Currently, we only support push down `EqualTo` Filters if it's the first
level of partitions(by path prefix), this pr tries to pushdown more complex
partition filters(Like `And`, `Or`, `EqualTo` etc) when fetching partitions.
Through Parallel listing partition paths, will use `PartialBindVisitor` to bind
partitions which are listed, and change the unresolved references to
`AlwaysTrue`.
e.g.
{code:java}
Given the table has 3 partition levels: year, month, day. And the existing
table partition paths are:
year=2023/month=2/day=11
year=2023/month=2/day=12
year=2024/month=2/day=12
If we want to push down the filter `year=2023 AND day=12`, When listing the
partition first level `year`, will bind schema `year` to `PartialBindVisitor`.
Since `day` is not provided, the filter will be modified to `year=2023 AND
TRUE`(optimized to `year=2023`), so the first 2 paths will be selected.
Then starts to parallel listing first 2 paths, since the day is still not
provided, these 2 paths still are selected.
And finally listing the last partition level, the filter `year=2023 AND
day=12` will be used and return `year=2023/month=2/day=12`
{code}
Because of
[discussion|https://github.com/apache/hudi/pull/8452#discussion_r1191971668]
[discussion|https://github.com/apache/hudi/pull/8452#discussion_r1199997268],
this feature would take effect if:
1. For {*}FileSystemBackedTableMetadata{*}:
{{hoodie.datasource.write.hive_style_partitioning}} and
{{hoodie.datasource.write.partitionpath.urlencode}} are both enabled
2. For {*}HoodieTableBackedTableMetadata{*}({{hoodie.metadata.enable}} is
true): only {{hoodie.datasource.write.hive_style_partitioning}} is enabled
## JIRA info
- Link: https://issues.apache.org/jira/browse/HUDI-6077
- Type: Improvement
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]