[ 
https://issues.apache.org/jira/browse/HDFS-10336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367319#comment-15367319
 ] 

Rakesh R commented on HDFS-10336:
---------------------------------

Thanks [~linyiqun] for the explanation and the work.
bq. This jira is focus on the reseting UGI in the test that we can see from the 
title. 
Agreed. In that case, how about doing a simple fix 
{{UserGroupInformation.reset();}} with this jira and creates a patch with only 
this line change as shown below. Like you mentioned create another jira for the 
test timeout failures. 
{code}
       // Reset UGI so that other tests are not affected.
+      UserGroupInformation.reset();
       UserGroupInformation.setConfiguration(new Configuration());
{code}

bq.  I think 10 mins is a long enough time, because the test only costs around 
15 seconds in my local. If it still happens timeout in these tests, maybe we 
would like to do further optimization ranther than increasing the timeout. I 
suggest that we can do the optimization work in a separate jira.
I ran 5 times, it took 45 secs(maximum in all run) and 18 secs(minimum in all 
run) in my local env. Could you please create a jira if not raised yet for 
handling the timeout case separately. Also, good to add a reference link to 
this jira for future references. Thanks!

> TestBalancer failing intermittently because of not reseting 
> UserGroupInformation completely
> -------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10336
>                 URL: https://issues.apache.org/jira/browse/HDFS-10336
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Yiqun Lin
>            Assignee: Yiqun Lin
>         Attachments: HDFS-10336.001.patch, HDFS-10336.002.patch, 
> HDFS-10336.003.patch
>
>
> The unit test {{TestBalancer}} failed sometimes. 
> I looked for the reason. I found two main reasons causing this.
> * 1st. The test {{TestBalancer#testBalancerWithKeytabs}} executed timeout.
> {code}
> org.apache.hadoop.hdfs.server.balancer.TestBalancer
> testBalancerWithKeytabs(org.apache.hadoop.hdfs.server.balancer.TestBalancer)  
> Time elapsed: 300.41 sec  <<< ERROR!
> java.lang.Exception: test timed out after 300000 milliseconds
>       at java.lang.Thread.sleep(Native Method)
>       at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1122)
>       at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1096)
>       at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:1060)
>       at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:635)
>       at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:689)
>       at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.testUnknownDatanode(TestBalancer.java:1098)
>       at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.access$000(TestBalancer.java:125)
> {code}
> * 2nd. The test {{TestBalancer#testBalancerWithKeytabs}} reset the {{UGI}} 
> not completely sometimes in the finally block. And this affected the other 
> unit tests threw {{IOException}}, like this:
> {code}
> testBalancerWithNonZeroThreadsForMove(org.apache.hadoop.hdfs.server.balancer.TestBalancer)
>   Time elapsed: 0 sec  <<< ERROR!
> java.io.IOException: Running in secure mode, but config doesn't have a keytab
>       at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:300)
> {code}
> And there were not only one test will be affected by this. We should add a 
> line to do before doing reset {{UGI}} operation and can avoid the potenial 
> exception happens.
> {code}
> UserGroupInformation.reset();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to