[
https://issues.apache.org/jira/browse/HDFS-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791674#comment-13791674
]
Jing Zhao commented on HDFS-5322:
---------------------------------
bq. Again, the basic question driving this change is why
FSNamesystem#checkOperation(OperationCategory.WRITE) is not throwing during a
transition to active?
During the transition (Standby -> Active), the current code first sets the
state of the NN to Active, then starts the active service, during which the NN
still needs to tail the remaining editlog. If a delegation token is contained
in that last part of editlog, 1)
FSNamesystem#checkOperation(OperationCategory.WRITE) will not throw anything
since the NN's state has already been changed to Active, 2) the new ANN cannot
find the token in its cache since it has not finished applying the editlog. We
should allow clients to retry since after NN finishes reading the editlog the
delegation token can be recognized.
In the meanwhile, if we let the NN first start active service, then change its
state to standby, your original hack in HADOOP-9880 can work, since a
standbyexception will be thrown. But this change will 1) extend the failover
time, and 2) trigger unnecessary client failover. And I'm not sure if this will
break other code.
> HDFS delegation token not found in cache errors seen on secure HA clusters
> --------------------------------------------------------------------------
>
> Key: HDFS-5322
> URL: https://issues.apache.org/jira/browse/HDFS-5322
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: ha
> Affects Versions: 2.1.1-beta
> Reporter: Arpit Gupta
> Assignee: Jing Zhao
> Attachments: HDFS-5322.000.patch, HDFS-5322.000.patch,
> HDFS-5322.001.patch, HDFS-5322.002.patch, HDFS-5322.003.patch,
> HDFS-5322.004.patch
>
>
> While running HA tests we have seen issues were we see HDFS delegation token
> not found in cache errors causing jobs running to fail.
> {code}
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
> |2013-10-06 20:14:51,193 INFO [main] mapreduce.Job: Task Id :
> attempt_1381090351344_0001_m_000007_0, Status : FAILED
> Error:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
> token (HDFS_DELEGATION_TOKEN token 11 for hrt_qa) can't be found in cache
> at org.apache.hadoop.ipc.Client.call(Client.java:1347)
> at org.apache.hadoop.ipc.Client.call(Client.java:1300)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source)
> {code}
--
This message was sent by Atlassian JIRA
(v6.1#6144)