slfan1989 commented on code in PR #6660:
URL: https://github.com/apache/hadoop/pull/6660#discussion_r1539407981


##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-globalpolicygenerator/src/main/java/org/apache/hadoop/yarn/server/globalpolicygenerator/applicationcleaner/DefaultApplicationCleaner.java:
##########
@@ -46,47 +45,38 @@ public void run() {
     LOG.info("Application cleaner run at time {}", now);
 
     FederationStateStoreFacade facade = getGPGContext().getStateStoreFacade();

Review Comment:
   Step 1: Retrieve all applications stored in the StateStore, which represents 
all applications submitted to the Router.
   Step 2: Use the Router's REST API to fetch all running tasks. This API will 
invoke applications from all active SubClusters.
   Step 3: Compare the results of Step1 and Step2 to identify applications that 
exist in Step1 but not in Step2. Delete these applications.
   
   There is a potential issue with this approach. If a particular SubCluster is 
undergoing maintenance, such as RM restart, Step2 will not be able to fetch the 
complete list of running applications. As a result, during the comparison in 
Step3, there is a risk of mistakenly deleting applications that are still 
running.
   
   We have three SubClusters: subClusterA, subClusterB, and subClusterC, with 
an equal allocation ratio of 1:1:1.
   
   We submit six applications through routerA.
   
   app1 and app2 are allocated to subClusterA
   app3 and app4 to subClusterB
   app5 and app6 to subClusterC.
   Among these, app1, app3, and app5 have completed their execution, and we 
expect to retain app2, app4, and app6 in the StateStore.
   
   In the normal scenario:
   
   Comparing the steps mentioned above:
   
   Step 1: We will retrieve six applications [app1, app2, app3, app4, app5, 
app6] from the StateStore.
   Step 2: We will fetch three applications [app2, app4, app6] from the 
Router's REST interface.
   Step 3: By comparing Step 1 and Step 2, we can identify that applications 
[app1, app3, app5] should be deleted.
   
   In the exceptional scenario:
   
   Comparing the steps mentioned above:
   
   Step 1: We will retrieve six applications [app1, app2, app3, app4, app5, 
app6] from the StateStore.
   Step 2: We will fetch the list of running applications from the Router's 
REST interface. However, due to maintenance in subClusterB and subClusterC, we 
can only obtain the applications running in subClusterA [app2].
   Step 3: By comparing Step 1 and Step 3, we can identify that applications 
[app1, app3, app4, app5, app6] should be deleted.
   
   In this case, we had an error deletion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to