James Clampffer created HDFS-11014:
--------------------------------------
Summary: libhdfs++: Make connection to HA clusters faster
Key: HDFS-11014
URL: https://issues.apache.org/jira/browse/HDFS-11014
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: James Clampffer
Assignee: James Clampffer
Priority: Minor
Right now when we get a StandbyException from the NN we inject a 20 second
delay before we try the alternate NN even if it's the first failover. The
first failover shouldn't have a delay (java client skips delay on first
failover).
Another minor change I'd like to make is to reduce the default number of
failover attempts from 15 (used in the apache config) to 4. My impression is
that higher numbers of failovers are really handy for longer running batch jobs
but in the libhdfs++ case the client is often an interactive application. In
this case it's generally preferable to fail sooner so a user doesn't have to
wait the ~8 minutes to time out when using default settings.
4 failovers is based on the assumption that if we can't immediately connect
there is either a GC pause which will most likely be finished before the second
connection attempt or it's a network or config issue that will take some
sorting out by an admin. It'd still be possible to override these in the
config for more tuning if a specific deployment tends to have more or less
network issues.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]