sivabalan narayanan created HUDI-2776:
-----------------------------------------
Summary: Cluster update strategy is fenced by write config
Key: HUDI-2776
URL: https://issues.apache.org/jira/browse/HUDI-2776
Project: Apache Hudi
Issue Type: Bug
Affects Versions: 0.10.0
Reporter: sivabalan narayanan
In a multi-writer scenario, not all writers might set the enable clustering
config.
BaseSparkCommitActionExecutor
{code:java}
private JavaRDD<HoodieRecord<T>>
clusteringHandleUpdate(JavaRDD<HoodieRecord<T>> inputRecordsRDD) {
if (config.isClusteringEnabled()) {
Set<HoodieFileGroupId> fileGroupsInPendingClustering =
table.getFileSystemView().getFileGroupsInPendingClustering().map(entry
-> entry.getKey()).collect(Collectors.toSet());
UpdateStrategy updateStrategy = (UpdateStrategy)ReflectionUtils
.loadClass(config.getClusteringUpdatesStrategyClass(), this.context,
fileGroupsInPendingClustering);
return
(JavaRDD<HoodieRecord<T>>)updateStrategy.handleUpdate(inputRecordsRDD);
} else {
return inputRecordsRDD;
}
} {code}
When clustering is scheduled and being executed by writer1, writer2 could go
ahead and make updates to the same file group w/o any issues, given writer2 did
not enable clustering in the write config.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)