ashishkumar50 commented on PR #5293: URL: https://github.com/apache/ozone/pull/5293#issuecomment-1719589461
> It feels like we should be storing the DNs we sent the commands to, and checking off the replies against what we sent, rather than expecting replies from "all the current replica DNs", as the current replicas could have changed between when the command was issued and when the replies were received. Already SCM stores in memory to which DN command is already sent and which DN has already responded, to avoid retry on the successful responding DN. And this [memory](https://github.com/apache/ozone/blob/742734b00603e9ce9aea24a231a594c0cbc56604/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java#L423)(in map) is kept until all the replica has responded, because after this only we can remove deleteTransactionID from rocksdb. If the current replica has changed after sending command, then in next retry it will send delete request to changed replica DN. > I think there should be a timeout, after which it discards the pending commands, and sends again to the current replica list and waits for a reply there. What do you think? In every retry it takes replica from [containerManager](https://github.com/apache/ozone/blob/742734b00603e9ce9aea24a231a594c0cbc56604/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java#L271) and send out the delete request to DN. So always it is current replica. > What about, we have 3 replicas. We send 3 deletes, then one node goes down making it under replicated. We never get a reply from that node. Replication makes a 3rd copy before checking. This was not under-replicated at the start and the change here would not help I think. This behaviour remains same before and after this change. During retry it will send delete request to the new replica. ### Current behaviour: Assume replication factor:3 If currently 2 replica present then send delete to 2 DNs. Entry remains in memory([transactionToDNsCommitMap](https://github.com/apache/ozone/blob/742734b00603e9ce9aea24a231a594c0cbc56604/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java#L282)) as it requires 3 response. After certain try if another replica comes up, then send delete to the third replica DN. Remove entry from memory and then remove from rocksdb. ### Behaviour after this PR: Assume replication factor:3 If currently 1-2 replica present then wait until it reaches replication factor number of replica before sending delete. Once replication factor satisfies send delete to all the 3 DNs. This avoids keeping track of [transactionToDNsCommitMap](https://github.com/apache/ozone/blob/742734b00603e9ce9aea24a231a594c0cbc56604/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java#L282) for long time in most of the cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
