freemandealer commented on code in PR #58035:
URL: https://github.com/apache/doris/pull/58035#discussion_r2575951294


##########
fe/fe-core/src/main/java/org/apache/doris/cloud/CloudWarmUpJob.java:
##########
@@ -539,20 +540,42 @@ public void releaseClients() {
     private final void clearJobOnBEs() {
         try {
             initClients();
-            for (Map.Entry<Long, Client> entry : beToClient.entrySet()) {
+            // Iterate with explicit iterator so we can remove invalidated 
clients during iteration.
+            Iterator<Map.Entry<Long, Client>> iter = 
beToClient.entrySet().iterator();
+            while (iter.hasNext()) {
+                Map.Entry<Long, Client> entry = iter.next();
+                long beId = entry.getKey();
+                Client client = entry.getValue();
                 TWarmUpTabletsRequest request = new TWarmUpTabletsRequest();
                 request.setType(TWarmUpTabletsRequestType.CLEAR_JOB);
                 request.setJobId(jobId);
                 if (this.isEventDriven()) {
                     TWarmUpEventType event = getTWarmUpEventType();
                     if (event == null) {
-                        throw new IllegalArgumentException("Unknown SyncEvent 
" + syncEvent);
+                        // If event type is unknown, skip this BE but continue 
others.
+                        LOG.warn("Unknown SyncEvent {}, skip CLEAR_JOB for BE 
{}", syncEvent, beId);
+                        continue;
                     }
                     request.setEvent(event);
                 }
-                LOG.info("send warm up request to BE {}. job_id={}, 
request_type=CLEAR_JOB",
-                        entry.getKey(), jobId);
-                entry.getValue().warmUpTablets(request);
+                LOG.info("send warm up request to BE {}. job_id={}, 
request_type=CLEAR_JOB", beId, jobId);
+                try {
+                    client.warmUpTablets(request);
+                } catch (Exception e) {
+                    // If RPC to this BE fails, invalidate this client and 
remove it from map,
+                    // then continue to next BE so that one bad BE won't block 
others.
+                    LOG.warn("send warm up request to BE {} failed: {}", beId, 
e.getMessage());
+                    try {
+                        TNetworkAddress addr = beToAddr == null ? null : 
beToAddr.get(beId);
+                        if (addr != null) {
+                            ClientPool.backendPool.invalidateObject(addr, 
client);

Review Comment:
   The pool is used locally, so invalidating one item won't affect other RPCs 
or jobs. Invalidation is to skip the failed BE and continue with the following.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to