[
https://issues.apache.org/jira/browse/HDFS-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13794166#comment-13794166
]
Daryn Sharp commented on HDFS-5322:
-----------------------------------
bq. During the transition (Standby -> Active), the current code first sets the
state of the NN to Active, then starts the active service, during which the NN
still needs to tail the remaining editlog
This is what I was questioning. Isn't it wrong for the NN to claim it's
active/writable when it's not? It seems like another state is needed to
indicate a transition is in progress - and that state indicates the namespace
isn't writable.
Otherwise kerberos and known token connections are going to block all the
handler threads during the transition. Which means ha admin commands may
become blocked during the transition which may be a serious problem.
> HDFS delegation token not found in cache errors seen on secure HA clusters
> --------------------------------------------------------------------------
>
> Key: HDFS-5322
> URL: https://issues.apache.org/jira/browse/HDFS-5322
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: ha
> Affects Versions: 2.1.1-beta
> Reporter: Arpit Gupta
> Assignee: Jing Zhao
> Fix For: 2.2.1
>
> Attachments: HDFS-5322.000.patch, HDFS-5322.000.patch,
> HDFS-5322.001.patch, HDFS-5322.002.patch, HDFS-5322.003.patch,
> HDFS-5322.004.patch, HDFS-5322.005.patch, HDFS-5322.006.patch
>
>
> While running HA tests we have seen issues were we see HDFS delegation token
> not found in cache errors causing jobs running to fail.
> {code}
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
> |2013-10-06 20:14:51,193 INFO [main] mapreduce.Job: Task Id :
> attempt_1381090351344_0001_m_000007_0, Status : FAILED
> Error:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
> token (HDFS_DELEGATION_TOKEN token 11 for hrt_qa) can't be found in cache
> at org.apache.hadoop.ipc.Client.call(Client.java:1347)
> at org.apache.hadoop.ipc.Client.call(Client.java:1300)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source)
> {code}
--
This message was sent by Atlassian JIRA
(v6.1#6144)