linyiqun commented on a change in pull request #1552:
URL: https://github.com/apache/ozone/pull/1552#discussion_r520411203



##########
File path: 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/states/endpoint/HeartbeatEndpointTask.java
##########
@@ -147,8 +147,12 @@ public void setDatanodeDetailsProto(DatanodeDetailsProto
       rpcEndpoint.setLastSuccessfulHeartbeat(ZonedDateTime.now());
       rpcEndpoint.zeroMissedCount();
     } catch (IOException ex) {
-      // put back the reports which failed to be sent
-      putBackReports(requestBuilder);
+      // don't resend reports to recon as it could be down for days
+      // DN is expected to work fine without recon and not go OOM
+      if (!rpcEndpoint.isPassive()) {
+        // put back the reports which failed to be sent
+        putBackReports(requestBuilder);
+      }

Review comment:
       For the case of Recon is down , current change can fix this.
   But for SCM is down, it could still lead OOM error I think since we put back 
reports again and again.
   A better way I am thinking for this: if we can check from thrown exception 
to see if SCM/Recon is out of service, if yes, no need to put back report 
anymore.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to