When I run that locally (latest trunk) it passes:

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.apache.hadoop.hbase.master.TestMasterFailover
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 69.721 sec

Results :

Tests run: 4, Failures: 0, Errors: 0, Skipped: 0

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2 minutes 29 seconds
[INFO] Finished at: Thu Nov 03 22:06:25 PDT 2011
[INFO] Final Memory: 58M/286M
[INFO] ------------------------------------------------------------------------


In the log I see some JMX related exceptions, but their timing did not
suggest any potentially hanging threads.

(Linux, OpenJDK 1.6 64 bit, needed to set umask to 022)


-- Lars



----- Original Message -----
From: Ted Yu <[email protected]>
To: [email protected]
Cc: 
Sent: Thursday, November 3, 2011 8:55 PM
Subject: TestMasterFailover#testMasterFailoverWithMockedRITOnDeadRS fails on 
Jenkins

Hi,
Currently TestMasterFailover#testMasterFailoverWithMockedRITOnDeadRS <
https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/105/testReport/org.apache.hadoop.hbase.master/TestMasterFailover/testMasterFailoverWithMockedRITOnDeadRS/<https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master/TestMasterFailover/testMasterFailoverWithMockedRITOnDeadRS/>>
consistently fails on 0.92 and TRUNK.

I intended to log a JIRA but https://issues.apache.org is giving me 503
error.

I briefly went over the code.
I think after each region is added to regionsThatShouldBeOnline, we should
log the name of region:
    // Region of enabled on dead server gets closed but not ack'd by master
    region = enabledAndOnDeadRegions.remove(0);
    regionsThatShouldBeOnline.add(region);
    log("2. expecting " + region.toString() + " to be online: ");

so that if the assertion below fails we know what type of scenario wasn't
working:
    for (HRegionInfo hri : regionsThatShouldBeOnline) {
      assertTrue("region=" + hri.getRegionNameAsString(),
onlineRegions.contains(hri));
    }

From the above mentioned test output I saw a lot of:

2011-11-03 21:52:58,652 FATAL [Thread-558.logSyncer] wal.HLog(1106):
Could not sync. Requesting close of hlog
java.io.IOException: Reflection
    at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:225)
    at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1090)
    at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1194)
    at 
org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1056)
    at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:223)
    ... 4 more
Caused by: java.io.IOException: DFSOutputStream is closed
    at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3483)
    at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
    at org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944)
    ... 8 more

Maybe they have something to do with regions stuck in RIT.

Cheers

Reply via email to