James Clampffer created HDFS-11014:

             Summary: libhdfs++: Make connection to HA clusters faster
                 Key: HDFS-11014
                 URL: https://issues.apache.org/jira/browse/HDFS-11014
             Project: Hadoop HDFS
          Issue Type: Sub-task
            Reporter: James Clampffer
            Assignee: James Clampffer
            Priority: Minor

Right now when we get a StandbyException from the NN we inject a 20 second 
delay before we try the alternate NN even if it's the first failover.  The 
first failover shouldn't have a delay (java client skips delay on first 

Another minor change I'd like to make is to reduce the default number of 
failover attempts from 15 (used in the apache config) to 4.  My impression is 
that higher numbers of failovers are really handy for longer running batch jobs 
but in the libhdfs++ case the client is often an interactive application.  In 
this case it's generally preferable to fail sooner so a user doesn't have to 
wait the ~8 minutes to time out when using default settings.

4 failovers is based on the assumption that if we can't immediately connect 
there is either a GC pause which will most likely be finished before the second 
connection attempt or it's a network or config issue that will take some 
sorting out by an admin.  It'd still be possible to override these in the 
config for more tuning if a specific deployment tends to have more or less 
network issues.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to