[
https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040264#comment-14040264
]
Yongjun Zhang commented on HDFS-6475:
-------------------------------------
HI [~daryn], [~jingzhao] and [~atm],
Many thanks to you guys for earlier review and comments. I just uploaded a new
revision (008) to address the comments and testing errors.
In summary,
- Per Daryn's suggestion, I attempted to remove getTrueCause() method from
Server.java as a whole, ran into test failures. After spending quite some time
to look into, I personally really removing the existing getTrueCauseMethod
really deserves a new JIRA, I filed HDFS-6588 with details and questions. I
hope you'd agree based on the information I provided there, but I'm certainly
open for further discussion.
- The new patch I just uploaded for HDFS-6475 is limited to handle the case
reported in this JIRA. It is only a few lines in ExceptionHandler.java, plus
the testcase I added. It no longer calls getTrueCause() method defined in
Server.java.
Thanks a lot for follow-up.
> WebHdfs clients fail without retry because incorrect handling of
> StandbyException
> ---------------------------------------------------------------------------------
>
> Key: HDFS-6475
> URL: https://issues.apache.org/jira/browse/HDFS-6475
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: ha, webhdfs
> Affects Versions: 2.4.0
> Reporter: Yongjun Zhang
> Assignee: Yongjun Zhang
> Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch,
> HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch,
> HDFS-6475.005.patch, HDFS-6475.006.patch, HDFS-6475.007.patch,
> HDFS-6475.008.patch
>
>
> With WebHdfs clients connected to a HA HDFS service, the delegation token is
> previously initialized with the active NN.
> When clients try to issue request, the NN it contacts is stored in a map
> returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact
> the NN based on the order, so likely the first one it runs into is StandbyNN.
> If the StandbyNN doesn't have the updated client crediential, it will throw a
> s SecurityException that wraps StandbyException.
> The client is expected to retry another NN, but due to the insufficient
> handling of SecurityException mentioned above, it failed.
> Example message:
> {code}
> {RemoteException={message=Failed to obtain user group information:
> org.apache.hadoop.security.token.SecretManager$InvalidToken:
> StandbyException, javaCl
> assName=java.lang.SecurityException, exception=SecurityException}}
> org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to
> obtain user group information:
> org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException
> at
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159)
> at
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325)
> at
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107)
> at
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635)
> at
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542)
> at
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431)
> at
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685)
> at
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696)
> at kclient1.kclient$1.run(kclient.java:64)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:356)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
> at kclient1.kclient.main(kclient.java:58)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)