[
https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040790#comment-14040790
]
Daryn Sharp commented on HDFS-6475:
-----------------------------------
bq. Your earlier suggestion indicated that we should use
SecretManager#retriableRetrievePassword instead of
SecretManager#retrievePassword, does that mean client code has to be modified?
If I understand the question: The methods are only used server-side so no
client-side changes should be required, so no incompatibility concerns.
Did you happen to trace how/where the {{StandbyException}} is wrapped in an
{{InvalidToken}}? It looks like
{{DelegationTokenSecretManager#retrievePassword}} is the only place it occurs,
but {{DelegationTokenSecretManager#retriableRetrievePassword}} does not wrap
exceptions in {{InvalidToken}}.
Is this maybe just a test case issue? Which testcase is failing?
> WebHdfs clients fail without retry because incorrect handling of
> StandbyException
> ---------------------------------------------------------------------------------
>
> Key: HDFS-6475
> URL: https://issues.apache.org/jira/browse/HDFS-6475
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: ha, webhdfs
> Affects Versions: 2.4.0
> Reporter: Yongjun Zhang
> Assignee: Yongjun Zhang
> Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch,
> HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch,
> HDFS-6475.005.patch, HDFS-6475.006.patch, HDFS-6475.007.patch,
> HDFS-6475.008.patch, HDFS-6475.009.patch
>
>
> With WebHdfs clients connected to a HA HDFS service, the delegation token is
> previously initialized with the active NN.
> When clients try to issue request, the NN it contacts is stored in a map
> returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact
> the NN based on the order, so likely the first one it runs into is StandbyNN.
> If the StandbyNN doesn't have the updated client crediential, it will throw a
> s SecurityException that wraps StandbyException.
> The client is expected to retry another NN, but due to the insufficient
> handling of SecurityException mentioned above, it failed.
> Example message:
> {code}
> {RemoteException={message=Failed to obtain user group information:
> org.apache.hadoop.security.token.SecretManager$InvalidToken:
> StandbyException, javaCl
> assName=java.lang.SecurityException, exception=SecurityException}}
> org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to
> obtain user group information:
> org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException
> at
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159)
> at
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325)
> at
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107)
> at
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635)
> at
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542)
> at
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431)
> at
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685)
> at
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696)
> at kclient1.kclient$1.run(kclient.java:64)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:356)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
> at kclient1.kclient.main(kclient.java:58)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)