ashishkumar50 commented on PR #5293:
URL: https://github.com/apache/ozone/pull/5293#issuecomment-1719589461

   > It feels like we should be storing the DNs we sent the commands to, and 
checking off the replies against what we sent, rather than expecting replies 
from "all the current replica DNs", as the current replicas could have changed 
between when the command was issued and when the replies were received.
   
   Already SCM stores in memory to which DN command is already sent and which 
DN has already responded, to avoid retry on the successful responding DN.
   And this 
[memory](https://github.com/apache/ozone/blob/742734b00603e9ce9aea24a231a594c0cbc56604/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java#L423)(in
 map) is kept until all the replica has responded, because after this only we 
can remove deleteTransactionID from rocksdb.
   If the current replica has changed after sending command, then in next retry 
it will send delete request to changed replica DN.
   
   > I think there should be a timeout, after which it discards the pending 
commands, and sends again to the current replica list and waits for a reply 
there. What do you think?
   
   In every retry it takes replica from 
[containerManager](https://github.com/apache/ozone/blob/742734b00603e9ce9aea24a231a594c0cbc56604/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java#L271)
 and send out the delete request to DN. So always it is current replica.
   
   > What about, we have 3 replicas. We send 3 deletes, then one node goes down 
making it under replicated. We never get a reply from that node. Replication 
makes a 3rd copy before checking. This was not under-replicated at the start 
and the change here would not help I think.
   
   This behaviour remains same before and after this change. During retry it 
will send delete request to the new replica.
   
   
   ### Current behaviour:
   Assume replication factor:3
   If currently 2 replica present then send delete to 2 DNs. Entry remains in 
memory([transactionToDNsCommitMap](https://github.com/apache/ozone/blob/742734b00603e9ce9aea24a231a594c0cbc56604/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java#L282))
 as it requires 3 response.
   After certain try if another replica comes up, then send delete to the third 
replica DN.
   Remove entry from memory and then remove from rocksdb.
   
   ### Behaviour after this PR:
   Assume replication factor:3
   If currently 1-2 replica present then wait until it reaches replication 
factor number of replica before sending delete.
   Once replication factor satisfies send delete to all the 3 DNs.
   This avoids keeping track of 
[transactionToDNsCommitMap](https://github.com/apache/ozone/blob/742734b00603e9ce9aea24a231a594c0cbc56604/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java#L282)
 for long time in most of the cases.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to