Davis Zhang created HUDI-9648:
---------------------------------

             Summary: Parititioned RLI take partition column value as a hint
                 Key: HUDI-9648
                 URL: https://issues.apache.org/jira/browse/HUDI-9648
             Project: Apache Hudi
          Issue Type: Bug
          Components: index
            Reporter: Davis Zhang
             Fix For: 1.2.0


for partitioned RLI or partitioned anything, we should be able to take a hint 
of what partition to look into.
For queries like
select a from t1 join t2 on t1.recKey = t2.c1 and t1.partitionCol=t2.c2

 
the query engine knows what partition could be and today spark already do 
dynamic partition pruning on top of that - The query engine has this info handy.
But today even for index join, the way we combine partition pruning and index 
pruning is inefficient - each prune path prune files separately and then join 
the overlap of the results to figure out what to read. There would be room for 
improvements if we allow deep integration between partition pruning and 
partitioned RLI by just telling RLI what partition we should focus on.
I also suggest to make this partition hint info a general hint as in future for 
other indexes they might also be able to integrate this info.
If this worth a retro, let's create a CU tracking that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to