[
https://issues.apache.org/jira/browse/HUDI-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17473243#comment-17473243
]
YangXuan commented on HUDI-2776:
--------------------------------
Please add the pr link address, thank you.
> Cluster update strategy should not be fenced by write config
> ------------------------------------------------------------
>
> Key: HUDI-2776
> URL: https://issues.apache.org/jira/browse/HUDI-2776
> Project: Apache Hudi
> Issue Type: Bug
> Affects Versions: 0.10.0
> Reporter: sivabalan narayanan
> Assignee: Sagar Sumit
> Priority: Blocker
> Fix For: 0.10.0
>
>
> In a multi-writer scenario, not all writers might set the enable clustering
> config.
> BaseSparkCommitActionExecutor
> {code:java}
> private JavaRDD<HoodieRecord<T>>
> clusteringHandleUpdate(JavaRDD<HoodieRecord<T>> inputRecordsRDD) {
> if (config.isClusteringEnabled()) {
> Set<HoodieFileGroupId> fileGroupsInPendingClustering =
>
> table.getFileSystemView().getFileGroupsInPendingClustering().map(entry ->
> entry.getKey()).collect(Collectors.toSet());
> UpdateStrategy updateStrategy = (UpdateStrategy)ReflectionUtils
> .loadClass(config.getClusteringUpdatesStrategyClass(), this.context,
> fileGroupsInPendingClustering);
> return
> (JavaRDD<HoodieRecord<T>>)updateStrategy.handleUpdate(inputRecordsRDD);
> } else {
> return inputRecordsRDD;
> }
> } {code}
> When clustering is scheduled and being executed by writer1, writer2 could go
> ahead and make updates to the same file group w/o any issues, given writer2
> did not enable clustering in the write config.
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)