[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

GitBox Wed, 08 Dec 2021 17:40:24 -0800


xiarixiaoyao commented on a change in pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#discussion_r765371275




##########
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java
##########
@@ -95,8 +95,7 @@ public MultipleSparkJobExecutionStrategy(HoodieTable table, 
HoodieEngineContext
         .map(inputGroup -> runClusteringForGroupAsync(inputGroup,
             clusteringPlan.getStrategy().getStrategyParams(),
             
Option.ofNullable(clusteringPlan.getPreserveHoodieMetadata()).orElse(false),
-            instantTime))
-        .map(CompletableFuture::join);
+            
instantTime)).collect(Collectors.toList()).stream().map(CompletableFuture::join);

Review comment:
       @alexeykudinkin   thanks for your review.  
   no,  this change can guarantee jobs will run in parallel. we should do 
collect(Collectors.toList()) before join the future.
   the test result and our user test result(see [SUPPORT] Zordering clustering 
on a moderate size dataset taking large amounts of time. #4135 )  see this 
modify is ok.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

Reply via email to