smengcl commented on PR #6014:
URL: https://github.com/apache/ozone/pull/6014#issuecomment-2145835103

   @ivandika3 
   
   > The idea looks good. However, we might need reduce the Ratis server watch 
timeout configuration to trigger NotReplicatedException.
   
   Yes indeed, the current Raft server (datanode) watch.timeout is way too high 
to trigger `NotReplicatedException`. I've tested that as long as I set DN's 
`raft.server.watch.timeout` lower than client's 
`raft.client.rpc.request.timeout`, the integration test would trigger the 
`NotReplicatedException`. So I intend to adjust the timeouts in this PR as well.
   
   > Additionally, `cluster.shutdownHddsDatanode(datanodes.get(0))` might 
shutdown the current datanode Ratis leader, which I think will not throw 
`NotReplicatedException` (e.g. `NotLeaderException` or `GroupMismatchException` 
(if the pipeline is closed due to `ContainerStateMachine#notifyLeaderChanged` 
after a new election)). In that case, will the datanode be excluded eventually? 
I think it might be retried in `KeyOutputStream` which will exclude it as well, 
but maybe we can ensure to shutdown the follower datanode instead of leader.
   
   Good idea. Let me check.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to