smengcl commented on a change in pull request #1552:
URL: https://github.com/apache/ozone/pull/1552#discussion_r523078800



##########
File path: 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/states/endpoint/HeartbeatEndpointTask.java
##########
@@ -147,8 +147,12 @@ public void setDatanodeDetailsProto(DatanodeDetailsProto
       rpcEndpoint.setLastSuccessfulHeartbeat(ZonedDateTime.now());
       rpcEndpoint.zeroMissedCount();
     } catch (IOException ex) {
-      // put back the reports which failed to be sent
-      putBackReports(requestBuilder);
+      // don't resend reports to recon as it could be down for days
+      // DN is expected to work fine without recon and not go OOM
+      if (!rpcEndpoint.isPassive()) {
+        // put back the reports which failed to be sent
+        putBackReports(requestBuilder);
+      }

Review comment:
       Thanks for the comment @linyiqun .
   
   Yes. If SCM is down for a very long time, DN can still OOM due to the same 
reason as Recon. So this is, at best, a short-term fix.
   
   > A better way I am thinking for this: if we can check from thrown exception 
to see if SCM/Recon is out of service, if yes, no need to put back report 
anymore.
   
   @nandakumar131 mentioned in the [jira 
comment](https://issues.apache.org/jira/browse/HDDS-4404?focusedCommentId=17228389&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17228389)
 that `putBackReports()` serves the purpose of informing SCM of block deletion. 
In this case we don't want to just toss those reports away when SCM is down. 
Checking the exact type of exception (e.g. if it is a connection issue) is a 
good idea though.
   
   I agree in the long run the fix could be to only queue the latest and 
necessary report.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to