[
https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039718#comment-14039718
]
Yongjun Zhang commented on HDFS-6475:
-------------------------------------
Hi,
I uploaded a new patch per [~daryn]'s suggestion. On top of what Daryn
suggested, and because of the exceptions I described in previous update (with
stack information), I added the logic for InvalidToken in ExceptionHandler:
{code}
if (e instanceof SecurityException) {
e = toCause(e);
}
if (e instanceof InvalidToken) {
e = toCause(e);
}
{code}
This logic is essentially what I wanted to share the original getTrueCause
method for.
Hi Daryn, would you please help review again?
BTW, refer to your comment:
{quote}
In saslProcess, just throw the exception instead of running it through
getTrueCause since it's not a "InvalidToken wrapping another exception" anymore.
{quote}
I did what you suggested, but I'm still getting InvalidToken exception (see the
stack described above). So it seems that the exception that saslProcess tries
to handle comes from different source than what I'm running into.
Thanks a lot.
> WebHdfs clients fail without retry because incorrect handling of
> StandbyException
> ---------------------------------------------------------------------------------
>
> Key: HDFS-6475
> URL: https://issues.apache.org/jira/browse/HDFS-6475
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: ha, webhdfs
> Affects Versions: 2.4.0
> Reporter: Yongjun Zhang
> Assignee: Yongjun Zhang
> Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch,
> HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch,
> HDFS-6475.005.patch, HDFS-6475.006.patch, HDFS-6475.007.patch
>
>
> With WebHdfs clients connected to a HA HDFS service, the delegation token is
> previously initialized with the active NN.
> When clients try to issue request, the NN it contacts is stored in a map
> returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact
> the NN based on the order, so likely the first one it runs into is StandbyNN.
> If the StandbyNN doesn't have the updated client crediential, it will throw a
> s SecurityException that wraps StandbyException.
> The client is expected to retry another NN, but due to the insufficient
> handling of SecurityException mentioned above, it failed.
> Example message:
> {code}
> {RemoteException={message=Failed to obtain user group information:
> org.apache.hadoop.security.token.SecretManager$InvalidToken:
> StandbyException, javaCl
> assName=java.lang.SecurityException, exception=SecurityException}}
> org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to
> obtain user group information:
> org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException
> at
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159)
> at
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325)
> at
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107)
> at
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635)
> at
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542)
> at
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431)
> at
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685)
> at
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696)
> at kclient1.kclient$1.run(kclient.java:64)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:356)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
> at kclient1.kclient.main(kclient.java:58)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)