Jing Zhao created HDFS-5399:
-------------------------------
Summary: Revisit SafeModeException and corresponding retry policies
Key: HDFS-5399
URL: https://issues.apache.org/jira/browse/HDFS-5399
Project: Hadoop HDFS
Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Currently for NN SafeMode, we have the following corresponding retry policies:
# In non-HA setup, the client will retry if the NN is in SafeMode.
Specifically, the client side's RPC adopts MultipleLinearRandomRetry policy if
a SafeModeException is wrapped in RemoteException.
# In HA setup, the client will retry if the NN is Active and in SafeMode.
Specifically, the SafeModeException is wrapped as a RetriableException in the
server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy
which recognizes RetriableException (see HDFS-5291).
There are several issues in the current implementation:
# The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator
through CLI), and the clients may not want to retry on this type of SafeMode.
# We should have a single generic strategy to address the mapping between
SafeMode and retry policy for both HA and non-HA setup. A possible
straightforward solution is to always wrap the SafeModeException in the
RetriableException to indicate that the clients should retry.
--
This message was sent by Atlassian JIRA
(v6.1#6144)