sodonnel commented on PR #4618:
URL: https://github.com/apache/ozone/pull/4618#issuecomment-1523001506

   Its worth noting here that there have been changes in the Replication 
Manager which means it will not keep adding and adding to the replication queue 
on the datanode, so problems where 1000's of replication commands end up on the 
datanode should not happen any more. Also, the datanode now drops the commands 
if they are too old for both replication and delete container commands.
   
   I believe you saw some scenarios where the DN queue had a lot of message 
resulting in a crash. Do you have a breakdown of what commands were queued and 
where? Eg replication commands on the replication queue, what about the 
commandQueue where commands get placed first - was it also suffering from many 
messages stuck on it?
   
   I don't see any new tests in this PR - if we are going to have logic to 
enforce limits we probably need a couple of simple tests to check they are 
working ok.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to