sodonnel commented on PR #5293:
URL: https://github.com/apache/ozone/pull/5293#issuecomment-1719495667

   OK - so say a container is under replicated. We send out 2 deletes to the 2 
DNs. Then a 3rd replica appears. It did not get sent the delete, but now the 
logic expects that it needs 3 replies and it will never get that 3rd reply. In 
this case, could you argue that after some reasonable timeout, the delete 
should be sent again? If a DN receives a delete for a block is has not got, it 
will reply with an OK, right? So after a timeout we try again and it cleans up.
   
   What about, we have 3 replicas. We send 3 deletes, then one node goes down 
making it under replicated. We never get a reply from that node. Replication 
makes a 3rd copy before checking. This was not under-replicated at the start 
and the change here would not help I think.
   
   It feels like we should be storing the DNs we sent the commands to, and 
checking off the replies against what we sent, rather than expecting replies 
from "all the current replica DNs", as the current replicas could have changed 
between when the command was issued and when the replies were received.
   
   I think there should be a timeout, after which it discards the pending 
commands, and sends again to the current replica list and waits for a reply 
there. What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to