stack commented on HBASE-5833:

More digging.  The newest test added here, 
testShouldCheckMasterFailOverWhenMETAIsInOpenedState, is a little interesting.  
It was added by this commit:

r1172063 | tedyu | 2011-09-17 13:27:00 -0700 (Sat, 17 Sep 2011) | 3 lines

HBASE-4400  .META. getting stuck if RS hosting it is dead and znode state is in
               RS_ZK_REGION_OPENED (Ramkrishna)


The test is a bunch of copy/paste confirming stuff its not using.  It then does 
a cluster shutdown but does it explicitly on a cluster object and not via 
HBaseTestingUtility though it then starts a cluster subsequently with 
HBaseTestingUtility.  Not using HTU to do both the shutodwn and the startup can 
make he HTU state confused on whether there a master available so we just wait 
for ever.  This seems to be responsible for case where test would timeout after 
15 minutes and say no tests run and none failed.

I added a timeout for this test of 3 minutes.

Other interesting stuff is that this TestMasterFailover starts clusters per 
method but shutdown leaves around some threads.  I dug in some and was able to 
clean up an LruBlockCache eviction thread but others persist and would take a 
little more work to undo.  They seem harmless but I'll list them anyways:

TestMasterFailover [JUnit]      
        org.eclipse.jdt.internal.junit.runner.RemoteTestRunner at 
                Thread [main] (Running) 
                Thread [ReaderThread] (Running) 
                Thread [Thread-2] (Suspended (breakpoint at line 587 in 
                        HBaseTestingUtility.shutdownMiniCluster() line: 587     
                        TestMasterFailover.testSimpleMasterFailover() line: 178 
                        NativeMethodAccessorImpl.invoke0(Method, Object, 
Object[]) line: not available [native method]  
                        NativeMethodAccessorImpl.invoke(Object, Object[]) line: 
                        DelegatingMethodAccessorImpl.invoke(Object, Object[]) 
line: 25  
                        Method.invoke(Object, Object...) line: 597      
                        FrameworkMethod$1.runReflectiveCall() line: 45  
                        FrameworkMethod$1(ReflectiveCallable).run() line: 15    
                        FrameworkMethod.invokeExplosively(Object, Object...) 
line: 42   
                        InvokeMethod.evaluate() line: 20        
                        FailOnTimeout$StatementThread.run() line: 62    
                Daemon Thread [Poller SunPKCS11-Darwin] (Running)       
                Thread [pool-1-thread-1] (Running)      
                Thread [pool-2-thread-1] (Running)      
                Thread [pool-3-thread-1] (Running)      
                Thread [pool-4-thread-1] (Running)      
                Daemon Thread [LeaseChecker] (Running)  
                Daemon Thread 
                Daemon Thread 
                Daemon Thread 
[Master:2;,54838,1335066803952-EventThread] (Running) 
                Daemon Thread 
[Master:1;,54836,1335066798880-EventThread] (Running) 
                Daemon Thread 
/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java (Apr 
21, 2012 8:53:07 PM)     

The thread names are enhanced -- v2 of this patch -- but things like 
decayingSampleTick are set in a static so hard to get rid of in test setup.  
The SendThread/EventThread are zk client hangouts.  Not sure what 
pool-4-thread-1 are (I've enhanced the HTable executor to include htable in 
name so these are identifiable going forward but above executor does not seem 
to be HTable).
> 0.92 build has been failing pretty consistently on TestMasterFailover....
> -------------------------------------------------------------------------
>                 Key: HBASE-5833
>                 URL: https://issues.apache.org/jira/browse/HBASE-5833
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>             Fix For: 0.92.2
>         Attachments: 5833.txt, closehregions.txt
> Trunk seems fine but 0.92 fails on this test pretty regularly.  Running it 
> local it seems to hang for me.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to