slfan1989 commented on code in PR #6473:
URL: https://github.com/apache/hadoop/pull/6473#discussion_r1479929985
##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-globalpolicygenerator/src/main/java/org/apache/hadoop/yarn/server/globalpolicygenerator/applicationcleaner/DefaultApplicationCleaner.java:
##########
@@ -46,47 +49,37 @@ public void run() {
LOG.info("Application cleaner run at time {}", now);
FederationStateStoreFacade facade = getGPGContext().getStateStoreFacade();
+
try {
- // Get the candidate list from StateStore before calling router
- Set<ApplicationId> allStateStoreApps = new HashSet<>();
- List<ApplicationHomeSubCluster> response =
+ // Step1. Get the candidate list from StateStore before calling router
+ List<ApplicationHomeSubCluster> applicationHomeSubClusters =
facade.getApplicationsHomeSubCluster();
- for (ApplicationHomeSubCluster app : response) {
- allStateStoreApps.add(app.getApplicationId());
- }
- LOG.info("{} app entries in FederationStateStore",
allStateStoreApps.size());
-
- // Get the candidate list from Registry before calling router
- List<String> allRegistryApps = getRegistryClient().getAllApplications();
- LOG.info("{} app entries in FederationRegistry",
allStateStoreApps.size());
-
- // Get the list of known apps from Router
- Set<ApplicationId> routerApps = getRouterKnownApplications();
- LOG.info("{} known applications from Router", routerApps.size());
+ LOG.info("FederationStateStore has {} applications.",
applicationHomeSubClusters.size());
- // Clean up StateStore entries
- Set<ApplicationId> toDelete =
- Sets.difference(allStateStoreApps, routerApps);
-
Review Comment:
Step 1: Retrieve all applications stored in the StateStore, which represents
all applications submitted to the Router.
Step 2: Use the Router's REST API to fetch all running tasks. This API will
invoke applications from all active SubClusters.
Step 3: Compare the results of `Step1` and `Step2` to identify applications
that exist in `Step1` but not in `Step2`. Delete these applications.
There is a potential issue with this approach. If a particular SubCluster is
undergoing maintenance, such as RM restart, `Step2` will not be able to fetch
the complete list of running applications. As a result, during the comparison
in `Step3`, there is a risk of mistakenly deleting applications that are still
running.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]