Jing Zhao created HDFS-5399: ------------------------------- Summary: Revisit SafeModeException and corresponding retry policies Key: HDFS-5399 URL: https://issues.apache.org/jira/browse/HDFS-5399 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao
Currently for NN SafeMode, we have the following corresponding retry policies: # In non-HA setup, the client will retry if the NN is in SafeMode. Specifically, the client side's RPC adopts MultipleLinearRandomRetry policy if a SafeModeException is wrapped in RemoteException. # In HA setup, the client will retry if the NN is Active and in SafeMode. Specifically, the SafeModeException is wrapped as a RetriableException in the server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy which recognizes RetriableException (see HDFS-5291). There are several issues in the current implementation: # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator through CLI), and the clients may not want to retry on this type of SafeMode. # We should have a single generic strategy to address the mapping between SafeMode and retry policy for both HA and non-HA setup. A possible straightforward solution is to always wrap the SafeModeException in the RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1#6144)