Yongjun Zhang created HDFS-6475: ----------------------------------- Summary: WebHdfs clients fail without retry because incorrect handling of StandbyException Key: HDFS-6475 URL: https://issues.apache.org/jira/browse/HDFS-6475 Project: Hadoop HDFS Issue Type: Bug Components: ha, webhdfs Affects Versions: 2.4.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang
With WebHdfs clients connected to a HA HDFS service, the delegation token is previously initialized with the active NN. When clients try to issue request, the NN it contacts is stored in a map returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact the NN based on the order, so likely the first one it runs into is StandbyNN. If the StandbyNN doesn't have the updated client crediential, it will throw a s SecurityException that wraps StandbyException. The client is expected to retry another NN, but due to the insufficient handling of SecurityException mentioned above, it failed. Example message: {code} {RemoteException={message=Failed to obtain user group information: org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException, javaCl assName=java.lang.SecurityException, exception=SecurityException}} org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to obtain user group information: org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) at kclient1.kclient$1.run(kclient.java:64) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:356) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) at kclient1.kclient.main(kclient.java:58) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)