[
https://issues.apache.org/jira/browse/HDFS-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040245#comment-14040245
]
Yongjun Zhang commented on HDFS-6588:
-------------------------------------
There are the following errors (see
https://builds.apache.org/job/PreCommit-HDFS-Build/7198//testReport/):
{code}
org.apache.hadoop.ipc.TestSaslRPC.testErrorMessage[0] 0.6 sec 1
org.apache.hadoop.ipc.TestSaslRPC.testErrorMessage[1] 22 ms 1
org.apache.hadoop.ipc.TestSaslRPC.testErrorMessage[2] 18 ms 1
org.apache.hadoop.ipc.TestSaslRPC.testErrorMessage[3] 18 ms 1
org.apache.hadoop.ipc.TestSaslRPC.testErrorMessage[4] 15 ms 1
org.apache.hadoop.hdfs.web.TestWebHdfsTokens.testLazyTokenFetchForSWebhdfs
5.5 sec 1
org.apache.hadoop.hdfs.web.TestWebHdfsTokens.testLazyTokenFetchForWebhdfs
{code}
The errors belong to two categories:
1. TestWebHdfsTokens
It failed due to the following exception:
{code}
java.security.PrivilegedActionException:
org.apache.hadoop.security.token.SecretManager$InvalidToken: token
(HDFS_DELEGATION_TOKEN token 5 for DummyUser) can't be found in cache.
{code}
thus causing a SecurityException thrown by UserProvider with cause
InvalidToken, which is the InvalidToken listed above. The reason is "token not
found in cache" and it has no cause assigned.
This means UserProvider could throw two types of exception relevant to our
discussion:
T1. SecurityException with InvalidToken as cause, which has StandbyException as
cause. Reported in HDFS-6475.
T2. SecurityException with InvalidToken as cause, which has no cause assigned,
and the failure was because "token not found in cache".
I found that currently committeed code (including Daryn's fix HDFS-6222) appear
to have handled T2, and there actually seems to be no real problem with T2,
except my earlier patch to HDFS-6475 "touched" it by returning the
InvalidToken, which triggerd these test failures. I am modifying the patch of
HDFS-6475 to just process T1 (and not T2) and submitting new version there.
Hi [~daryn] thanks a lot for your review and comments earlier at HDFS-6475.
hope this change will be fine with you. I will invite you to comment there when
I have the new version uploaded. Appreciate very much.
2. TestSaslRPC
{code}
expected:<[Token is invali]d> but was:<[DIGEST-MD5: IO error acquiring
passwor]d>
Stacktrace
org.junit.ComparisonFailure: expected:<[Token is invali]d> but
was:<[DIGEST-MD5: IO error acquiring passwor]d>
at org.junit.Assert.assertEquals(Assert.java:115)
at org.junit.Assert.assertEquals(Assert.java:144)
at
org.apache.hadoop.ipc.TestSaslRPC.testErrorMessage(TestSaslRPC.java:387)
{code}
This error happened because we no longer pass the exception through
getTrueCause.
The stack when the above error happened is listed below as a reference. We can
see that the {{retriableRetrievePassword}} method is called, and this test
designed it to return InvalidToken because of class {{BadTokenSecretManager}}
in the test code TestSaslRPC.java.
Hi [~jingzhao], many thanks for your review and advice in HDFS-6475. The
{{getTrueCause}} method is part of the HDFS-5322 solution you worked out. Would
you please comment whether the above failed tests (failed because of
getTrueCause removal) reflect some real use scenarios? If so, then the logic
getgTrueCause need to be reflected in somewhere if this method is to be
removed. Thanks.
{code}
TestSaslRPC$BadTokenSecretManager.retrievePassword(TestSaslRPC$TestTokenIdentifier)
line: 268
TestSaslRPC$BadTokenSecretManager(TestSaslRPC$TestTokenSecretManager).retrievePassword(TokenIdentifier)
line: 1
TestSaslRPC$BadTokenSecretManager(SecretManager<T>).retriableRetrievePassword(T)
line: 91
SaslRpcServer$SaslDigestCallbackHandler.getPassword(TokenIdentifier) line: 275
SaslRpcServer$SaslDigestCallbackHandler.handle(Callback[]) line: 302
DigestMD5Server.validateClientResponse(byte[][]) line: 585
DigestMD5Server.evaluateResponse(byte[]) line: 244
Server$Connection.processSaslToken(RpcHeaderProtos$RpcSaslProto) line: 1392
Server$Connection.processSaslMessage(RpcHeaderProtos$RpcSaslProto) line: 1369
Server$Connection.saslProcess(RpcHeaderProtos$RpcSaslProto) line: 1270
Server$Connection.saslReadAndProcess(DataInputStream) line: 1244
Server$Connection.processRpcOutOfBandRequest(RpcHeaderProtos$RpcRequestHeaderProto,
DataInputStream) line: 1932
Server$Connection.processOneRpc(byte[]) line: 1805
Server$Connection.readAndProcess() line: 1548
Server$Listener.doRead(SelectionKey) line: 753
Server$Listener$Reader.doRunLoop() line: 627
Server$Listener$Reader.run() line: 598
{code}
BTW, I have spent quite some time to analyze this, I do believe removal of
{{getTrueCause}} is a separate issue and deserve a new jira here. I hope you'd
agree based on the information provided above. If you still think it should be
taken care of as part of HDFS-6475, please let me know. I'm certainly open for
discussion. Thanks a lot.
> Investigating removing getTrueCause method in Server.java
> ---------------------------------------------------------
>
> Key: HDFS-6588
> URL: https://issues.apache.org/jira/browse/HDFS-6588
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: security, webhdfs
> Affects Versions: 2.5.0
> Reporter: Yongjun Zhang
>
> When addressing Daryn Sharp's comment for HDFS-6475 quoted below:
> {quote}
> What I'm saying is I think the patch adds too much unnecessary code. Filing
> an improvement to delete all but a few lines of the code changed in this
> patch seems a bit odd. I think you just need to:
> - Delete getTrueCause entirely instead of moving it elsewhere
> - In saslProcess, just throw the exception instead of running it through
> getTrueCause since it's not a "InvalidToken wrapping another exception"
> anymore.
> - Keep your 3-line change to unwrap SecurityException in toResponse
> {quote}
> There are multiple test failures, after making the suggested changes, Filing
> this jira to dedicate to the investigation of removing getTrueCause method.
> More detail will be put in the first comment.
--
This message was sent by Atlassian JIRA
(v6.2#6252)