TengHuo opened a new issue, #7106: URL: https://github.com/apache/hudi/issues/7106
I didn't find propose template, so just raised a ISSUE here. I can raise an RFC PR if needed, will raise a PR for code review later. ## Background In `HoodieMergeOnReadRDD`, Alexey added column prune support in PR https://github.com/apache/hudi/pull/4888, which is nice, it can speed up MOR _rt table query significantly. However, this performance improvement is limited by a `whitelistedPayloadClasses `, so column prune is only supported in `OverwriteWithLatestAvroPayload `. If we implemented any other payload class, it can't utilise this feature. https://github.com/apache/hudi/blob/df69aa75dbd42c2a0e96e67746fb1c57fa27888c/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala#L84 ## Propose After studying about the column prune feature in `HoodieMergeOnReadRDD` implemented by Alexey, **we added 2 new method in the interface `HoodieRecordPayload` to tell `HoodieMergeOnReadRDD` if a payload class can be applied column prune, and if there is any extra columns for doing merge**. ```java public interface HoodieRecordPayload<T extends HoodieRecordPayload> extends Serializable { /*...*/ // 2 new method we added /** * This method can tell HoodieBaseRelation if column prune can be applied for this payload implementation. * By default, column prune will be used in MOR table using OverwriteWithLatestAvroPayload * * @return if this payload can apply column prune when query MOR table */ @PublicAPIMethod(maturity = ApiMaturityLevel.EVOLVING) default boolean canApplyColumnPrune() { return false; } /** * Return a set of fields which are mandatory in pre-combine and combineAndGetUpdateValue * * @return the set of do pre-combine and combineAndGetUpdateValue need columns */ @PublicAPIMethod(maturity = ApiMaturityLevel.EVOLVING) default Set<String> getMergeNeedFields() { return Collections.emptySet(); } } ``` We have implemented this feature in Spark side, and also started the dev work for supporting it in Trino. Hi @alexeykudinkin , any suggestions about this feature? Do we need to raise an RFC PR for this feature? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
