codope commented on code in PR #4118:
URL: https://github.com/apache/hudi/pull/4118#discussion_r844982092
##########
hudi-common/src/main/java/org/apache/hudi/common/util/ClusteringUtils.java:
##########
@@ -124,7 +125,16 @@
// get all filegroups in the plan
getFileGroupEntriesInClusteringPlan(clusteringPlan.getLeft(),
clusteringPlan.getRight()));
- Map<HoodieFileGroupId, HoodieInstant> resultMap =
resultStream.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
+ Map<HoodieFileGroupId, HoodieInstant> resultMap;
+ try {
+ resultMap = resultStream.collect(Collectors.toMap(Map.Entry::getKey,
Map.Entry::getValue));
+ } catch (Exception e) {
+ if (e instanceof IllegalStateException &&
e.getMessage().contains("Duplicate key")) {
+ throw new HoodieException("Found duplicate file groups pending
clustering. If you're running deltastreamer in continuous mode, consider adding
delay using --min-sync-interval-seconds. "
Review Comment:
anyway, now we have OCC with in process lock provider when metadata is
enabled and users just need to set one config to adjust concurrency mode in
case of deltastreamer/spark streaming:
`HoodieWriteConfig#AUTO_ADJUST_LOCK_CONFIGS`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]