[ 
https://issues.apache.org/jira/browse/HDFS-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040245#comment-14040245
 ] 

Yongjun Zhang commented on HDFS-6588:
-------------------------------------

There are the following errors (see 
https://builds.apache.org/job/PreCommit-HDFS-Build/7198//testReport/):

{code} 
 org.apache.hadoop.ipc.TestSaslRPC.testErrorMessage[0]  0.6 sec 1
 org.apache.hadoop.ipc.TestSaslRPC.testErrorMessage[1]  22 ms   1
 org.apache.hadoop.ipc.TestSaslRPC.testErrorMessage[2]  18 ms   1
 org.apache.hadoop.ipc.TestSaslRPC.testErrorMessage[3]  18 ms   1
 org.apache.hadoop.ipc.TestSaslRPC.testErrorMessage[4]  15 ms   1
 org.apache.hadoop.hdfs.web.TestWebHdfsTokens.testLazyTokenFetchForSWebhdfs     
5.5 sec 1
 org.apache.hadoop.hdfs.web.TestWebHdfsTokens.testLazyTokenFetchForWebhdfs
{code}

The errors belong to two categories:

1. TestWebHdfsTokens

It failed due to the following exception:
{code}
java.security.PrivilegedActionException: 
org.apache.hadoop.security.token.SecretManager$InvalidToken: token 
(HDFS_DELEGATION_TOKEN token 5 for DummyUser) can't be found in cache.
{code}
thus causing a SecurityException thrown by UserProvider with cause 
InvalidToken, which is the InvalidToken listed above. The reason is "token not 
found in cache" and it has no cause assigned.

This means UserProvider could throw two types of exception relevant to our 
discussion:

T1. SecurityException with InvalidToken as cause, which has StandbyException as 
cause. Reported in HDFS-6475.
T2. SecurityException with InvalidToken as cause, which has no cause assigned, 
and the failure was because "token not found in cache".

I found that currently committeed code (including Daryn's fix HDFS-6222) appear 
to have handled T2, and there actually seems to be no real problem with T2, 
except my earlier patch to HDFS-6475 "touched" it by returning the 
InvalidToken, which triggerd these test failures.  I am modifying the patch of 
HDFS-6475 to just process T1 (and not T2) and submitting new version there. 

Hi [~daryn]  thanks a lot for your review and comments earlier at HDFS-6475. 
hope this change will be fine with you. I will invite you to comment there when 
I have the new version uploaded. Appreciate very much.

2. TestSaslRPC

{code}
expected:<[Token is invali]d> but was:<[DIGEST-MD5: IO error acquiring 
passwor]d>
Stacktrace

org.junit.ComparisonFailure: expected:<[Token is invali]d> but 
was:<[DIGEST-MD5: IO error acquiring passwor]d>
        at org.junit.Assert.assertEquals(Assert.java:115)
        at org.junit.Assert.assertEquals(Assert.java:144)
        at 
org.apache.hadoop.ipc.TestSaslRPC.testErrorMessage(TestSaslRPC.java:387)
{code}

This error happened because we no longer pass the exception through 
getTrueCause. 

The stack when the above error happened is listed below as a reference.  We can 
see that the {{retriableRetrievePassword}} method is called, and this test 
designed it to return InvalidToken because of class {{BadTokenSecretManager}} 
in the test code TestSaslRPC.java. 

Hi [~jingzhao], many thanks for your review and advice in HDFS-6475. The 
{{getTrueCause}} method is part of the HDFS-5322 solution you worked out. Would 
you please comment whether the above failed tests (failed because of  
getTrueCause removal) reflect some real use scenarios? If so, then the logic 
getgTrueCause need to be reflected in somewhere if this method is to be 
removed. Thanks.

{code}
TestSaslRPC$BadTokenSecretManager.retrievePassword(TestSaslRPC$TestTokenIdentifier)
 line: 268   
TestSaslRPC$BadTokenSecretManager(TestSaslRPC$TestTokenSecretManager).retrievePassword(TokenIdentifier)
 line: 1 
TestSaslRPC$BadTokenSecretManager(SecretManager<T>).retriableRetrievePassword(T)
 line: 91       
SaslRpcServer$SaslDigestCallbackHandler.getPassword(TokenIdentifier) line: 275  
SaslRpcServer$SaslDigestCallbackHandler.handle(Callback[]) line: 302    
DigestMD5Server.validateClientResponse(byte[][]) line: 585      
DigestMD5Server.evaluateResponse(byte[]) line: 244      
Server$Connection.processSaslToken(RpcHeaderProtos$RpcSaslProto) line: 1392     
Server$Connection.processSaslMessage(RpcHeaderProtos$RpcSaslProto) line: 1369   
Server$Connection.saslProcess(RpcHeaderProtos$RpcSaslProto) line: 1270  
Server$Connection.saslReadAndProcess(DataInputStream) line: 1244        
Server$Connection.processRpcOutOfBandRequest(RpcHeaderProtos$RpcRequestHeaderProto,
 DataInputStream) line: 1932 
Server$Connection.processOneRpc(byte[]) line: 1805      
Server$Connection.readAndProcess() line: 1548   
Server$Listener.doRead(SelectionKey) line: 753  
Server$Listener$Reader.doRunLoop() line: 627    
Server$Listener$Reader.run() line: 598  
{code}

BTW, I have spent quite some time to analyze this, I do believe removal of 
{{getTrueCause}} is a separate issue and deserve a new jira here. I hope you'd 
agree based on the information provided above. If you still think it should be 
taken care of as part of HDFS-6475, please let me know. I'm certainly open for 
discussion. Thanks a lot.




> Investigating removing getTrueCause method in Server.java
> ---------------------------------------------------------
>
>                 Key: HDFS-6588
>                 URL: https://issues.apache.org/jira/browse/HDFS-6588
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: security, webhdfs
>    Affects Versions: 2.5.0
>            Reporter: Yongjun Zhang
>
> When addressing Daryn Sharp's comment for HDFS-6475 quoted below:
> {quote}
> What I'm saying is I think the patch adds too much unnecessary code. Filing 
> an improvement to delete all but a few lines of the code changed in this 
> patch seems a bit odd. I think you just need to:
> - Delete getTrueCause entirely instead of moving it elsewhere
> - In saslProcess, just throw the exception instead of running it through 
> getTrueCause since it's not a "InvalidToken wrapping another exception" 
> anymore.
> - Keep your 3-line change to unwrap SecurityException in toResponse
> {quote}
> There are multiple test failures, after making the suggested changes, Filing 
> this jira to dedicate to the investigation of removing getTrueCause method.
> More detail will be put in the first comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to