sivabalan narayanan created HUDI-2776:
-----------------------------------------

             Summary: Cluster update strategy is fenced by write config
                 Key: HUDI-2776
                 URL: https://issues.apache.org/jira/browse/HUDI-2776
             Project: Apache Hudi
          Issue Type: Bug
    Affects Versions: 0.10.0
            Reporter: sivabalan narayanan


In a multi-writer scenario, not all writers might set the enable clustering 
config. 

BaseSparkCommitActionExecutor
{code:java}
private JavaRDD<HoodieRecord<T>> 
clusteringHandleUpdate(JavaRDD<HoodieRecord<T>> inputRecordsRDD) {
  if (config.isClusteringEnabled()) {
    Set<HoodieFileGroupId> fileGroupsInPendingClustering =
        table.getFileSystemView().getFileGroupsInPendingClustering().map(entry 
-> entry.getKey()).collect(Collectors.toSet());
    UpdateStrategy updateStrategy = (UpdateStrategy)ReflectionUtils
        .loadClass(config.getClusteringUpdatesStrategyClass(), this.context, 
fileGroupsInPendingClustering);
    return 
(JavaRDD<HoodieRecord<T>>)updateStrategy.handleUpdate(inputRecordsRDD);
  } else {
    return inputRecordsRDD;
  }
} {code}
When clustering is scheduled and being executed by writer1, writer2 could go 
ahead and make updates to the same file group w/o any issues, given writer2 did 
not enable clustering in the write config. 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to