[
https://issues.apache.org/jira/browse/HDFS-14230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Íñigo Goiri updated HDFS-14230:
-------------------------------
Resolution: Fixed
Hadoop Flags: Reviewed
Fix Version/s: HDFS-13891
Status: Resolved (was: Patch Available)
> RBF: Throw RetriableException instead of IOException when no namenodes
> available
> --------------------------------------------------------------------------------
>
> Key: HDFS-14230
> URL: https://issues.apache.org/jira/browse/HDFS-14230
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.2.0, 3.1.1, 2.9.2, 3.0.3
> Reporter: Fei Hui
> Assignee: Fei Hui
> Priority: Major
> Fix For: HDFS-13891
>
> Attachments: HDFS-14230-HDFS-13891.001.patch,
> HDFS-14230-HDFS-13891.002.patch, HDFS-14230-HDFS-13891.003.patch,
> HDFS-14230-HDFS-13891.004.patch, HDFS-14230-HDFS-13891.005.patch,
> HDFS-14230-HDFS-13891.006.patch
>
>
> Failover usually happens when upgrading namenodes. And there are no active
> namenodes within some seconds, Accessing HDFS through router fails at this
> moment. This could make jobs failure or hang. Some hive jobs logs are as
> follow
> {code:java}
> 2019-01-03 16:12:08,337 Stage-1 map = 100%, reduce = 100%, Cumulative CPU
> 133.33 sec
> MapReduce Total cumulative CPU time: 2 minutes 13 seconds 330 msec
> Ended Job = job_1542178952162_24411913
> Launching Job 4 out of 6
> Exception in thread "Thread-86" java.lang.RuntimeException:
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): No namenode
> available under nameservice Cluster3
> at
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.shouldRetry(RouterRpcClient.java:328)
> at
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invoke(RouterRpcClient.java:488)
> at
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invoke(RouterRpcClient.java:495)
> at
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:385)
> at
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:760)
> at
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130)
> Caused by:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
> Operation category READ is not supported in state standby
> at
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1804)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1338)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3925)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1014)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130)
> {code}
> Deep into the code. Maybe we can throw StandbyException when no namenodes
> available. Client will fail after some retries
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]