Davis Zhang created HUDI-9648:
---------------------------------
Summary: Parititioned RLI take partition column value as a hint
Key: HUDI-9648
URL: https://issues.apache.org/jira/browse/HUDI-9648
Project: Apache Hudi
Issue Type: Bug
Components: index
Reporter: Davis Zhang
Fix For: 1.2.0
for partitioned RLI or partitioned anything, we should be able to take a hint
of what partition to look into.
For queries like
select a from t1 join t2 on t1.recKey = t2.c1 and t1.partitionCol=t2.c2
the query engine knows what partition could be and today spark already do
dynamic partition pruning on top of that - The query engine has this info handy.
But today even for index join, the way we combine partition pruning and index
pruning is inefficient - each prune path prune files separately and then join
the overlap of the results to figure out what to read. There would be room for
improvements if we allow deep integration between partition pruning and
partitioned RLI by just telling RLI what partition we should focus on.
I also suggest to make this partition hint info a general hint as in future for
other indexes they might also be able to integrate this info.
If this worth a retro, let's create a CU tracking that.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)