[jira] [Created] (HDFS-6507) Improve DFSAdmin to support HA cluster better

Zesheng Wu (JIRA) Tue, 10 Jun 2014 00:21:17 -0700

Zesheng Wu created HDFS-6507:
--------------------------------

             Summary: Improve DFSAdmin to support HA cluster better
                 Key: HDFS-6507
                 URL: https://issues.apache.org/jira/browse/HDFS-6507
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: tools
    Affects Versions: 2.4.0
            Reporter: Zesheng Wu
            Assignee: Zesheng Wu



Currently, the commands supported in DFSAdmin can be classified into three 
categories according to the protocol used:
1.ClientProtocol
Commands in this category generally implement by calling the corresponding 
function of the DFSClient class, and will call the corresponding remote 
implementation function at the NN side finally. At the NN side, all these 
operations are classified into five categories: UNCHECKED, READ, WRITE, 
CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
allows UNCHECKED operations. In the current implementation of DFSClient, it 
will connect one NN first, if the first NN is not Active and the operation is 
not allowed, it will failover to the second NN. So here comes the problem, some 
of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, 
setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED 
operations, and when executing these commands in the DFSAdmin command line, 
they will be sent to a definite NN, no matter it is Active or Standby. This may 
result in two problems: 
a. If the first tried NN is standby, and the operation takes effect only on 
Standby NN, which is not the expected result.
b. If the operation needs to take effect on both NN, but it takes effect on 
only one NN. In the future, when there is a NN failover, there may have 
problems.
Here I propose the following improvements:
a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
operations, we should classify it clearly.
b. If the command can not be classified as one of the above four operations, or 
if the command needs to take effect on both NN, we should send the request to 
both Active and Standby NNs.

2.Refresh protocols: RefreshAuthorizationPolicyProtocol, 
RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
RefreshCallQueueProtocol
Commands in this category, including refreshServiceAcl, 
refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
sending the request to remote NN. In the current implementation, these requests 
will be sent to a definite NN, no matter it is Active or Standby. Here I 
propose that we sent these requests to both NNs.

3.ClientDatanodeProtocol
Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6507) Improve DFSAdmin to support HA cluster better

Reply via email to