TengHuo opened a new issue, #7106:
URL: https://github.com/apache/hudi/issues/7106

   I didn't find propose template, so just raised a ISSUE here. I can raise an 
RFC PR if needed, will raise a PR for code review later.
   
   ## Background
   
   In `HoodieMergeOnReadRDD`, Alexey added column prune support in PR 
https://github.com/apache/hudi/pull/4888, which is nice, it can speed up MOR 
_rt table query significantly. 
   
   However, this performance improvement is limited by a 
`whitelistedPayloadClasses `, so column prune is only supported in 
`OverwriteWithLatestAvroPayload `. If we implemented any other payload class, 
it can't utilise this feature.
   
   
https://github.com/apache/hudi/blob/df69aa75dbd42c2a0e96e67746fb1c57fa27888c/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala#L84
   
   ## Propose
   
   After studying about the column prune feature in `HoodieMergeOnReadRDD` 
implemented by Alexey, **we added 2 new method in the interface 
`HoodieRecordPayload` to tell `HoodieMergeOnReadRDD` if a payload class can be 
applied column prune, and if there is any extra columns for doing merge**.
   
   ```java
   public interface HoodieRecordPayload<T extends HoodieRecordPayload> extends 
Serializable {
     /*...*/
   
     // 2 new method we added
     /**
      * This method can tell HoodieBaseRelation if column prune can be applied 
for this payload implementation.
      * By default, column prune will be used in MOR table using 
OverwriteWithLatestAvroPayload
      *
      * @return if this payload can apply column prune when query MOR table
      */
     @PublicAPIMethod(maturity = ApiMaturityLevel.EVOLVING)
     default boolean canApplyColumnPrune() {
       return false;
     }
   
     /**
      * Return a set of fields which are mandatory in pre-combine and 
combineAndGetUpdateValue
      *
      * @return the set of do pre-combine and combineAndGetUpdateValue need 
columns
      */
     @PublicAPIMethod(maturity = ApiMaturityLevel.EVOLVING)
     default Set<String> getMergeNeedFields() {
       return Collections.emptySet();
     }
   }
   ```
   
   We have implemented this feature in Spark side, and also started the dev 
work for supporting it in Trino.
   
   Hi @alexeykudinkin , any suggestions about this feature? Do we need to raise 
an RFC PR for this feature?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to