slfan1989 commented on code in PR #6660:
URL: https://github.com/apache/hadoop/pull/6660#discussion_r1539407981
##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-globalpolicygenerator/src/main/java/org/apache/hadoop/yarn/server/globalpolicygenerator/applicationcleaner/DefaultApplicationCleaner.java:
##########
@@ -46,47 +45,38 @@ public void run() {
LOG.info("Application cleaner run at time {}", now);
FederationStateStoreFacade facade = getGPGContext().getStateStoreFacade();
Review Comment:
Step 1: Retrieve all applications stored in the StateStore, which represents
all applications submitted to the Router.
Step 2: Use the Router's REST API to fetch all running tasks. This API will
invoke applications from all active SubClusters.
Step 3: Compare the results of Step1 and Step2 to identify applications that
exist in Step1 but not in Step2. Delete these applications.
There is a potential issue with this approach. If a particular SubCluster is
undergoing maintenance, such as RM restart, Step2 will not be able to fetch the
complete list of running applications. As a result, during the comparison in
Step3, there is a risk of mistakenly deleting applications that are still
running.
We have three SubClusters: subClusterA, subClusterB, and subClusterC, with
an equal allocation ratio of 1:1:1.
We submit six applications through routerA.
app1 and app2 are allocated to subClusterA
app3 and app4 to subClusterB
app5 and app6 to subClusterC.
Among these, app1, app3, and app5 have completed their execution, and we
expect to retain app2, app4, and app6 in the StateStore.
In the normal scenario:
Comparing the steps mentioned above:
Step 1: We will retrieve six applications [app1, app2, app3, app4, app5,
app6] from the StateStore.
Step 2: We will fetch three applications [app2, app4, app6] from the
Router's REST interface.
Step 3: By comparing Step 1 and Step 2, we can identify that applications
[app1, app3, app5] should be deleted.
In the exceptional scenario:
Comparing the steps mentioned above:
Step 1: We will retrieve six applications [app1, app2, app3, app4, app5,
app6] from the StateStore.
Step 2: We will fetch the list of running applications from the Router's
REST interface. However, due to maintenance in subClusterB and subClusterC, we
can only obtain the applications running in subClusterA [app2].
Step 3: By comparing Step 1 and Step 3, we can identify that applications
[app1, app3, app4, app5, app6] should be deleted.
In this case, we had an error deletion.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]