[
https://issues.apache.org/jira/browse/HDFS-5555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837177#comment-13837177
]
Jing Zhao commented on HDFS-5555:
---------------------------------
bq. The other fix is to make sure the iterator supports failover as well.
Agree. Currently the cache pool iterator is defined within
ClientNamenodeProtocolTranslatorPB and it is always associated with the
corresponding rpcProxy. Thus it cannot support failover. We may want to define
the iterator inside DFSClient instead, where DFSClient#namenode supports
failover in HA setup.
> CacheAdmin commands fail when first listed NameNode is in Standby
> -----------------------------------------------------------------
>
> Key: HDFS-5555
> URL: https://issues.apache.org/jira/browse/HDFS-5555
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: caching
> Affects Versions: 3.0.0
> Reporter: Stephen Chu
> Assignee: Jimmy Xiang
>
> I am on a HA-enabled cluster. The NameNodes are on host-1 and host-2.
> In the configurations, we specify the host-1 NN first and the host-2 NN
> afterwards in the _dfs.ha.namenodes.ns1_ property (where _ns1_ is the name of
> the nameservice).
> If the host-1 NN is Standby and the host-2 NN is Active, some CacheAdmins
> will fail complaining about operation not supported in standby state.
> e.g.
> {code}
> bash-4.1$ hdfs cacheadmin -removeDirectives -path /user/hdfs2
> Exception in thread "main"
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
> Operation category READ is not supported in state standby
> at
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1501)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1082)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.listCacheDirectives(FSNamesystem.java:6892)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer$ServerSideCacheEntriesIterator.makeRequest(NameNodeRpcServer.java:1263)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer$ServerSideCacheEntriesIterator.makeRequest(NameNodeRpcServer.java:1249)
> at
> org.apache.hadoop.fs.BatchedRemoteIterator.makeRequest(BatchedRemoteIterator.java:77)
> at
> org.apache.hadoop.fs.BatchedRemoteIterator.makeRequestIfNeeded(BatchedRemoteIterator.java:85)
> at
> org.apache.hadoop.fs.BatchedRemoteIterator.hasNext(BatchedRemoteIterator.java:99)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.listCacheDirectives(ClientNamenodeProtocolServerSideTranslatorPB.java:1087)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> at org.apache.hadoop.ipc.Client.call(Client.java:1348)
> at org.apache.hadoop.ipc.Client.call(Client.java:1301)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> at com.sun.proxy.$Proxy9.listCacheDirectives(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB$CacheEntriesIterator.makeRequest(ClientNamenodeProtocolTranslatorPB.java:1079)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB$CacheEntriesIterator.makeRequest(ClientNamenodeProtocolTranslatorPB.java:1064)
> at
> org.apache.hadoop.fs.BatchedRemoteIterator.makeRequest(BatchedRemoteIterator.java:77)
> at
> org.apache.hadoop.fs.BatchedRemoteIterator.makeRequestIfNeeded(BatchedRemoteIterator.java:85)
> at
> org.apache.hadoop.fs.BatchedRemoteIterator.hasNext(BatchedRemoteIterator.java:99)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$32.hasNext(DistributedFileSystem.java:1704)
> at
> org.apache.hadoop.hdfs.tools.CacheAdmin$RemoveCacheDirectiveInfosCommand.run(CacheAdmin.java:372)
> at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:84)
> at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:89)
> {code}
> After manually failing over from host-2 to host-1, the CacheAdmin commands
> succeed.
> The affected commands are:
> -listPools
> -listDirectives
> -removeDirectives
--
This message was sent by Atlassian JIRA
(v6.1#6144)