CR Hota created HDFS-14588:
------------------------------
Summary: Client retries Standby NN continuously even if Active NN
is available (WebHDFS)
Key: HDFS-14588
URL: https://issues.apache.org/jira/browse/HDFS-14588
Project: Hadoop HDFS
Issue Type: Bug
Reporter: CR Hota
This is a behavior we have observed in our HA setup of HDFS.
# Active NN is up and serving traffic.
# Stand By NN is restarted for maintenance.
# After step 2 all new clients (webhdfs only) which connect to Stand By keep
seeing Retriable Exception as Stand By NN is not yet started (Rpc server is yet
to come up as FS image is loading) but http server is started and ready to
accept traffic. This keeps happening till rpcserver is up and SNN knows that
it's truely standby. Based on start up time this behavior can continue based on
start-up times which is high (many minutes) for big clusters.
This above behavior is causing low availability of HDFS when HDFS is actually
still available.
Ideally webhdfs should throw standby exception (if HA is enabled) and let
clients connect to active following that. If active is also not available
clients will bounce and automatically connect to the right active.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]