RexXiong commented on code in PR #2535:
URL: https://github.com/apache/celeborn/pull/2535#discussion_r1623707056
##########
master/src/main/java/org/apache/celeborn/service/deploy/master/clustermeta/ha/HAMasterMetaManager.java:
##########
@@ -365,6 +365,11 @@ public void handleWorkerEvent(
}
}
+ @Override
+ public void handleReportWorkerDecommission(List<WorkerInfo> workers, String
requestId) {
Review Comment:
should submit the request to ratis server
##########
master/src/main/java/org/apache/celeborn/service/deploy/master/clustermeta/AbstractMetaManager.java:
##########
@@ -436,6 +446,12 @@ public void updateWorkerEventMeta(int
workerEventTypeValue, List<WorkerInfo> wor
}
}
+ public void updateMetaByReportWorkerDecommission(List<WorkerInfo> workers) {
+ synchronized (this.workers) {
+ decommissionWorkers.addAll(workers);
Review Comment:
Before this, worker decommission will ReportWorkerUnavailable, which
indicates the worker would be shutdown, then Client would quickly tell worker
commits those associated partitions, If we change shutdownWorkers to
decommissionWorkers, May be decommission take longer time than before.
##########
worker/src/main/scala/org/apache/celeborn/service/deploy/worker/Worker.scala:
##########
@@ -971,6 +986,14 @@ private[celeborn] class Worker(
}
serverBootstraps
}
+
+ private def isDecommissioning: Int = {
+ if (shutdown.get() && workerStatusManager.exitEventType ==
WorkerEventType.Decommission) {
Review Comment:
Wd need use currentWorkerStatus.getState to check the worker status.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]