sumitagrawl commented on PR #4384:
URL: https://github.com/apache/ozone/pull/4384#issuecomment-1467513755

   > With this implementation we are assuming duplicate work from retries will 
be minimal, hence no explicit retry check is needed. If SCM is retrying the 
command, it means it did [not get an ack back from the 
datanode](https://github.com/errose28/ozone/blob/890ade5302ac8d3c6e591642247680d599f0e1df/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java#L407-L408).
 Since these command acks are attached to the DN heartbeats, the node should go 
stale or dead before retries pile up excessively.
   > 
   > @sumitagrawl it currently looks like we are still sending commands to 
nodes that may be stale/dead/decommissioning nothing significant is done with 
the result of [this 
check](https://github.com/errose28/ozone/blob/890ade5302ac8d3c6e591642247680d599f0e1df/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/SCMBlockDeletingService.java#L137-L138).
 Could you look in to this?
   
   Created new jira HDDS-6548 for this, where need avoid caching up repeated 
transaction when DN is not responsive and need cleanup cache when DNs are 
stale/dead/decommissioning


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to