When I run that locally (latest trunk) it passes: ------------------------------------------------------- T E S T S ------------------------------------------------------- Running org.apache.hadoop.hbase.master.TestMasterFailover Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 69.721 sec
Results : Tests run: 4, Failures: 0, Errors: 0, Skipped: 0 [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESSFUL [INFO] ------------------------------------------------------------------------ [INFO] Total time: 2 minutes 29 seconds [INFO] Finished at: Thu Nov 03 22:06:25 PDT 2011 [INFO] Final Memory: 58M/286M [INFO] ------------------------------------------------------------------------ In the log I see some JMX related exceptions, but their timing did not suggest any potentially hanging threads. (Linux, OpenJDK 1.6 64 bit, needed to set umask to 022) -- Lars ----- Original Message ----- From: Ted Yu <[email protected]> To: [email protected] Cc: Sent: Thursday, November 3, 2011 8:55 PM Subject: TestMasterFailover#testMasterFailoverWithMockedRITOnDeadRS fails on Jenkins Hi, Currently TestMasterFailover#testMasterFailoverWithMockedRITOnDeadRS < https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/105/testReport/org.apache.hadoop.hbase.master/TestMasterFailover/testMasterFailoverWithMockedRITOnDeadRS/<https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master/TestMasterFailover/testMasterFailoverWithMockedRITOnDeadRS/>> consistently fails on 0.92 and TRUNK. I intended to log a JIRA but https://issues.apache.org is giving me 503 error. I briefly went over the code. I think after each region is added to regionsThatShouldBeOnline, we should log the name of region: // Region of enabled on dead server gets closed but not ack'd by master region = enabledAndOnDeadRegions.remove(0); regionsThatShouldBeOnline.add(region); log("2. expecting " + region.toString() + " to be online: "); so that if the assertion below fails we know what type of scenario wasn't working: for (HRegionInfo hri : regionsThatShouldBeOnline) { assertTrue("region=" + hri.getRegionNameAsString(), onlineRegions.contains(hri)); } From the above mentioned test output I saw a lot of: 2011-11-03 21:52:58,652 FATAL [Thread-558.logSyncer] wal.HLog(1106): Could not sync. Requesting close of hlog java.io.IOException: Reflection at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:225) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1090) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1194) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1056) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:223) ... 4 more Caused by: java.io.IOException: DFSOutputStream is closed at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3483) at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97) at org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944) ... 8 more Maybe they have something to do with regions stuck in RIT. Cheers
