CR Hota created HDFS-14588: ------------------------------ Summary: Client retries Standby NN continuously even if Active NN is available (WebHDFS) Key: HDFS-14588 URL: https://issues.apache.org/jira/browse/HDFS-14588 Project: Hadoop HDFS Issue Type: Bug Reporter: CR Hota
This is a behavior we have observed in our HA setup of HDFS. # Active NN is up and serving traffic. # Stand By NN is restarted for maintenance. # After step 2 all new clients (webhdfs only) which connect to Stand By keep seeing Retriable Exception as Stand By NN is not yet started (Rpc server is yet to come up as FS image is loading) but http server is started and ready to accept traffic. This keeps happening till rpcserver is up and SNN knows that it's truely standby. Based on start up time this behavior can continue based on start-up times which is high (many minutes) for big clusters. This above behavior is causing low availability of HDFS when HDFS is actually still available. Ideally webhdfs should throw standby exception (if HA is enabled) and let clients connect to active following that. If active is also not available clients will bounce and automatically connect to the right active. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org