[
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zesheng Wu updated HDFS-6507:
-----------------------------
Attachment: HDFS-6507.8.patch
> Improve DFSAdmin to support HA cluster better
> ---------------------------------------------
>
> Key: HDFS-6507
> URL: https://issues.apache.org/jira/browse/HDFS-6507
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: tools
> Affects Versions: 2.4.0
> Reporter: Zesheng Wu
> Assignee: Zesheng Wu
> Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch, HDFS-6507.3.patch,
> HDFS-6507.4-inprogress.patch, HDFS-6507.4.patch, HDFS-6507.5.patch,
> HDFS-6507.6.patch, HDFS-6507.7.patch, HDFS-6507.7.patch, HDFS-6507.8.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding
> function of the DFSClient class, and will call the corresponding remote
> implementation function at the NN side finally. At the NN side, all these
> operations are classified into five categories: UNCHECKED, READ, WRITE,
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only
> allows UNCHECKED operations. In the current implementation of DFSClient, it
> will connect one NN first, if the first NN is not Active and the operation is
> not allowed, it will failover to the second NN. So here comes the problem,
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage,
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as
> UNCHECKED operations, and when executing these commands in the DFSAdmin
> command line, they will be sent to a definite NN, no matter it is Active or
> Standby. This may result in two problems:
> a. If the first tried NN is standby, and the operation takes effect only on
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on
> only one NN. In the future, when there is a NN failover, there may have
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations,
> or if the command needs to take effect on both NN, we should send the request
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol,
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol,
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl,
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and
> sending the request to remote NN. In the current implementation, these
> requests will be sent to a definite NN, no matter it is Active or Standby.
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.
--
This message was sent by Atlassian JIRA
(v6.2#6252)