[ 
https://issues.apache.org/jira/browse/HDFS-7858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343889#comment-14343889
 ] 

Aaron T. Myers commented on HDFS-7858:
--------------------------------------

Hey folks, sorry to come into this discussion so late.

Given that some folks choose to use HDFS HA without auto failover at all, and 
thus without ZKFCs or ZK in sight, I think we should target any solution to 
this problem to work without ZK. I'm also a little leery of using a cache file, 
as I'm afraid of thundering herd effects (if the file is in HDFS or in a home 
dir which is network mounted), and also don't like the fact that in a large 
cluster all users on all machines might need to populate this cache file.

As such, I'd propose that we pursue either of the following two options:

# Optimistically try to connect to both configured NNs simultaneously, thus 
allowing that one (the standby) may take a while to respond, but also expecting 
that the active will always respond rather promptly. This is similar to 
Kihwal's suggestion.
# Have the client connect to the JNs to determine which NN is the likely the 
active. In my experience, even those who don't use automatic failover basically 
always use the QJM. I think those that continue to use NFS-based HA are very 
few and far between.

Thoughts?

> Improve HA Namenode Failover detection on the client
> ----------------------------------------------------
>
>                 Key: HDFS-7858
>                 URL: https://issues.apache.org/jira/browse/HDFS-7858
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Arun Suresh
>            Assignee: Arun Suresh
>         Attachments: HDFS-7858.1.patch
>
>
> In an HA deployment, Clients are configured with the hostnames of both the 
> Active and Standby Namenodes.Clients will first try one of the NNs 
> (non-deterministically) and if its a standby NN, then it will respond to the 
> client to retry the request on the other Namenode.
> If the client happens to talks to the Standby first, and the standby is 
> undergoing some GC / is busy, then those clients might not get a response 
> soon enough to try the other NN.
> Proposed Approach to solve this :
> 1) Since Zookeeper is already used as the failover controller, the clients 
> could talk to ZK and find out which is the active namenode before contacting 
> it.
> 2) Long-lived DFSClients would have a ZK watch configured which fires when 
> there is a failover so they do not have to query ZK everytime to find out the 
> active NN
> 2) Clients can also cache the last active NN in the user's home directory 
> (~/.lastNN) so that short-lived clients can try that Namenode first before 
> querying ZK



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to