Ethan Rose created HDDS-9323:
--------------------------------
Summary: Better datanode exclude list handling for long-lived
clients
Key: HDDS-9323
URL: https://issues.apache.org/jira/browse/HDDS-9323
Project: Apache Ozone
Issue Type: Bug
Components: Ozone Client
Reporter: Ethan Rose
Currently it is possible that a long lived client can add most or all nodes of
a small cluster to its exclude list, and further writes using that client
instance will fail. There are two ways this can be improved:
# A timeout to remove nodes from the exclude list after so that they can be
retried. For EC, this exists and is configured to 10 minutes by default. Ratis
does not currently have this but it should be added.
# Allow the write to fall back to nodes in the exclude list if that is all
that is available. This could be implemented on the server side, or as a retry
from the client based on the server's initial response.
These issues are especially relevant for S3 gateway, which uses a persistent
Ozone client to connect to the cluster while it is up.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]