[ 
https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039675#comment-14039675
 ] 

Yongjun Zhang commented on HDFS-6475:
-------------------------------------

Hello Guys,

Thank you all for the review and discussion.

As a follow-up, the first thing I did was to check that 
retriableRetrievePassword does give StandbyException.

Then I tried what [~daryn] suggested how we could fix this jira, and still run 
into the same symptom (client side),:

{code}
org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
        at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
        at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:348)
        at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$600(WebHdfsFileSystem.java:108)
        at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:580)
        at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:546)
        at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431)
        at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685)
        at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696)
        at kclient1.kclient1$2.run(kclient1.java:80)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:356)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
        at kclient1.kclient1.main(kclient1.java:74)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 StandbyException
        at 
org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159)
        at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325)
        at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:108)
        at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635)
        at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542)
{code}

I traced the server side and found the following:

UserProvider.getValue caught 
{{org.apache.hadoop.security.token.SecretManager$InvalidToken: 
StandbyException}}
with the following stack

{code}
UserProvider.getValue(HttpContext) line: 56
UserProvider.getValue(HttpContext) line: 41
InjectableValuesProvider.getInjectableValues(HttpContext) line: 46
AbstractResourceMethodDispatchProvider$ResponseOutInvoker(AbstractResourceMethodDispatchProvider$EntityParamInInvoker).getParams(HttpContext)
 line: 153
AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(Object, 
HttpContext) line: 203
AbstractResourceMethodDispatchProvider$ResponseOutInvoker(ResourceJavaMethodDispatcher).dispatch(Object,
 HttpContext) line: 75
HttpMethodRule.accept(CharSequence, Object, UriRuleContext) line: 288
ResourceClassRule.accept(CharSequence, Object, UriRuleContext) line: 108
RightHandPathRule.accept(CharSequence, Object, UriRuleContext) line: 147
RootResourceClassesRule.accept(CharSequence, Object, UriRuleContext) line: 84
WebApplicationImpl._handleRequest(WebApplicationContext, ContainerRequest) 
line: 1469
WebApplicationImpl._handleRequest(WebApplicationContext, ContainerRequest, 
ContainerResponse) line: 1400
WebApplicationImpl.handleRequest(ContainerRequest, ContainerResponse) line: 1349
WebApplicationImpl.handleRequest(ContainerRequest, ContainerResponseWriter) 
line: 1339
ServletContainer$InternalWebComponent(WebComponent).service(URI, URI, 
HttpServletRequest, HttpServletResponse) line: 416
ServletContainer.service(URI, URI, HttpServletRequest, HttpServletResponse) 
line: 537
ServletContainer.service(HttpServletRequest, HttpServletResponse) line: 699
ServletContainer(HttpServlet).service(ServletRequest, ServletResponse) line: 820
ServletHolder.handle(ServletRequest, ServletResponse) line: 511
ServletHandler$CachedChain.doFilter(ServletRequest, ServletResponse) line: 1221
AuthFilter.doFilter(ServletRequest, ServletResponse, FilterChain) line: 82
ServletHandler$CachedChain.doFilter(ServletRequest, ServletResponse) line: 1212
HttpServer2$QuotingInputFilter.doFilter(ServletRequest, ServletResponse, 
FilterChain) line: 1183
ServletHandler$CachedChain.doFilter(ServletRequest, ServletResponse) line: 1212
NoCacheFilter.doFilter(ServletRequest, ServletResponse, FilterChain) line: 45
ServletHandler$CachedChain.doFilter(ServletRequest, ServletResponse) line: 1212
NoCacheFilter.doFilter(ServletRequest, ServletResponse, FilterChain) line: 45
ServletHandler$CachedChain.doFilter(ServletRequest, ServletResponse) line: 1212
ServletHandler.handle(String, HttpServletRequest, HttpServletResponse, int) 
line: 399
SecurityHandler.handle(String, HttpServletRequest, HttpServletResponse, int) 
line: 216
SessionHandler.handle(String, HttpServletRequest, HttpServletResponse, int) 
line: 182
WebAppContext(ContextHandler).handle(String, HttpServletRequest, 
HttpServletResponse, int) line: 766
WebAppContext.handle(String, HttpServletRequest, HttpServletResponse, int) 
line: 450
ContextHandlerCollection.handle(String, HttpServletRequest, 
HttpServletResponse, int) line: 230
Server(HandlerWrapper).handle(String, HttpServletRequest, HttpServletResponse, 
int) line: 152
Server.handle(HttpConnection) line: 326
HttpConnection.handleRequest() line: 542
HttpConnection$RequestHandler.headerComplete() line: 928
{code}

Upon catching this exceptoin, UserProvider wraps it with SecurtyException and 
throw.

Then at ExceptionHandler, the first Exception caught is
{{ContainerException with cause: 
com.sun.jersey.api.container.ContainerException: Exception obtaining 
parameters}}
which in turn has cause
{{java.lang.SecurityException: Failed to obtain user group information: 
org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException}}

With the current code in ExceptionHandler, the ContainerExcepition's cause is 
first extracted, then with the three line code I added for SecurityExcetpion,
the InvalidToken exception is extracted, it stops there, so the InvalidToken 
with cause StandbyException is sent back to client, and caused the client side 
exception described in the beginning of this comment.

That's what explains the client side exception. I could add another logic in 
ExceptionHandler as
{code} 
   if the exception is InvalidToken, 
     extract cause
{code}

Since we can add quite a few logic like this, I guess maybe it will be fine. I 
will give it a try.
Please let me know if you guys know of  better option than this. 

Thanks a lot.



> WebHdfs clients fail without retry because incorrect handling of 
> StandbyException
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-6475
>                 URL: https://issues.apache.org/jira/browse/HDFS-6475
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha, webhdfs
>    Affects Versions: 2.4.0
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>         Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, 
> HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, 
> HDFS-6475.005.patch, HDFS-6475.006.patch
>
>
> With WebHdfs clients connected to a HA HDFS service, the delegation token is 
> previously initialized with the active NN.
> When clients try to issue request, the NN it contacts is stored in a map 
> returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact 
> the NN based on the order, so likely the first one it runs into is StandbyNN. 
> If the StandbyNN doesn't have the updated client crediential, it will throw a 
> s SecurityException that wraps StandbyException.
> The client is expected to retry another NN, but due to the insufficient 
> handling of SecurityException mentioned above, it failed.
> Example message:
> {code}
> {RemoteException={message=Failed to obtain user group information: 
> org.apache.hadoop.security.token.SecretManager$InvalidToken: 
> StandbyException, javaCl
> assName=java.lang.SecurityException, exception=SecurityException}}
> org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to 
> obtain user group information: 
> org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException
>         at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159)
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325)
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107)
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635)
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542)
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431)
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685)
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696)
>         at kclient1.kclient$1.run(kclient.java:64)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:356)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
>         at kclient1.kclient.main(kclient.java:58)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to