[ 
https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041171#comment-14041171
 ] 

Yongjun Zhang commented on HDFS-6475:
-------------------------------------

HI [~daryn],

Would like to check with you, you mentioned 
{quote}
If it turns out to be a lot more complicated, then perhaps a followup jira is ok
{quote}
Based on the information we have so far, the work involved is to remove 
getTrueCause, and replace the retrievePassword with retriableRetrievePassword, 
changing the interface spec of relevant methods because 
retriableRetrievePassword throws more exceptions,  removing getTrueCause 
method, fixing the test failures reported in HDFS-6588. 

I hope you'd agree that it's appropriate to dedicate HDFS-6588 for the above 
mentioned work, and use the lastest patch I posted for HDFS-6475 to handle the 
SecurityException that UserProvider throws.  Would you please comment again? 
Thanks.

BTW, thanks [~jingzhao] for clarifying things in HDFS-6588 (sorry I had a typo 
in last update as 6589). 


> WebHdfs clients fail without retry because incorrect handling of 
> StandbyException
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-6475
>                 URL: https://issues.apache.org/jira/browse/HDFS-6475
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha, webhdfs
>    Affects Versions: 2.4.0
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>         Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, 
> HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, 
> HDFS-6475.005.patch, HDFS-6475.006.patch, HDFS-6475.007.patch, 
> HDFS-6475.008.patch, HDFS-6475.009.patch
>
>
> With WebHdfs clients connected to a HA HDFS service, the delegation token is 
> previously initialized with the active NN.
> When clients try to issue request, the NN it contacts is stored in a map 
> returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact 
> the NN based on the order, so likely the first one it runs into is StandbyNN. 
> If the StandbyNN doesn't have the updated client crediential, it will throw a 
> s SecurityException that wraps StandbyException.
> The client is expected to retry another NN, but due to the insufficient 
> handling of SecurityException mentioned above, it failed.
> Example message:
> {code}
> {RemoteException={message=Failed to obtain user group information: 
> org.apache.hadoop.security.token.SecretManager$InvalidToken: 
> StandbyException, javaCl
> assName=java.lang.SecurityException, exception=SecurityException}}
> org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to 
> obtain user group information: 
> org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException
>         at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159)
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325)
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107)
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635)
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542)
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431)
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685)
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696)
>         at kclient1.kclient$1.run(kclient.java:64)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:356)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
>         at kclient1.kclient.main(kclient.java:58)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to