errose28 commented on PR #4384: URL: https://github.com/apache/ozone/pull/4384#issuecomment-1467348898
With this implementation we are assuming duplicate work from retries will be minimal, hence no explicit retry check is needed. If SCM is retrying the command, it means it did [not get an ack back from the datanode](https://github.com/errose28/ozone/blob/890ade5302ac8d3c6e591642247680d599f0e1df/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java#L407-L408). Since these command acks are attached to the DN heartbeats, the node should go stale or dead before retries pile up excessively. @sumitagrawl it currently looks like we are still sending commands to nodes that may be stale/dead/decommissioning nothing significant is done with the result of [this check](https://github.com/errose28/ozone/blob/890ade5302ac8d3c6e591642247680d599f0e1df/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/SCMBlockDeletingService.java#L137-L138). Could you look in to this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
