[GitHub] [hudi] zhuanshenbsj1 commented on a diff in pull request #8505: [HUDI-6106] Spark offline compaction/Clustering Job will do clean like Flink job

via GitHub Mon, 24 Apr 2023 23:27:34 -0700


zhuanshenbsj1 commented on code in PR #8505:
URL: https://github.com/apache/hudi/pull/8505#discussion_r1176051746



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDTableServiceClient.java:
##########
@@ -245,6 +246,7 @@ private void completeClustering(HoodieReplaceCommitMetadata 
metadata,
           metrics.updateCommitMetrics(parsedInstant.getTime(), durationInMs, 
metadata, HoodieActiveTimeline.REPLACE_COMMIT_ACTION)
       );
     }
+    waitForAsyncServiceCompletion();
     LOG.info("Clustering successfully on commit " + clusteringCommitTime);

Review Comment:
   Without this change，if config ASYNC_CLEAN = true，AsyncCleanerService will be 
used to do clean in offline job . In my unit testing for offline job，if the 
completion time of the compact/cluster job is earlier than the completion time 
of the sync-cleaning job, function BaseHoodieTableServiceClient.close() will 
force the asynchronous  cleaning job to be closed， it will causes interrupt 
Excpetion and end this cleaning.
   So I added this wait and made the entire task wait for clean to complete 
before smoothly exiting.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] zhuanshenbsj1 commented on a diff in pull request #8505: [HUDI-6106] Spark offline compaction/Clustering Job will do clean like Flink job

Reply via email to