DaveTeng0 opened a new pull request, #5530:
URL: https://github.com/apache/ozone/pull/5530

   ## What changes were proposed in this pull request?
   Currently it is possible that a long lived client can add most or all nodes 
of a small cluster to its exclude list, and further writes using that client 
instance will fail. This PR add a timeout for RATIS key to remove datanodes 
from the exclude list so that they can be retried later. 
   (For EC key, this mechanism exists and is configured to 10 minutes by 
default.)
   
   This improvement is especially relevant for S3 gateway, which uses a 
persistent Ozone client to connect to the cluster.
   
   And there is another part of improvement that allows the write to fall back 
to datanodes in the exclude list if that is all available. This would be 
implemented as a retry from the client based on the server's initial error 
response.
   
   This part of work is separated into another PR:  
https://github.com/apache/ozone/pull/5514
   
   ## What is the link to the Apache JIRA
   https://issues.apache.org/jira/browse/HDDS-9323
   
   ## How was this patch tested?
   unit test
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to