0.90.4 was just released. Can you run TestRollingRestart using latest 0.90 branch ?
This test didn't fail on https://builds.apache.org/view/G-L/view/HBase/job/hbase-0.90/ Cheers On Wed, Aug 10, 2011 at 6:21 PM, Zhoushuaifeng <[email protected]>wrote: > Is there anyone else do this test and encounter the failure? And what's > your opinion? > > Zhou Shuaifeng(Frank) > > -----Original Message----- > From: Zhoushuaifeng [mailto:[email protected]] > Sent: Wednesday, August 10, 2011 10:09 AM > To: [email protected] > Subject: TestRollingRestart fail occasionally > > Hi, > I run TestRollingRestart(0.90.3), it fails occasionally. The failing log > shows that split log runs in to a circle, the recoverFileLease fail and the > while() never end and the test timeout and fail. > > Here are some of the logs: > After restarting primary master, not all the RSs connected before the > master stop waiting: > TRR: Restarting primary master > INFO [Master:0;linux1.site:35977] master.ServerManager(660): Waiting on > regionserver(s) count to settle; currently=3 > 2011-07-06 09:12:56,331 INFO [Master:0;linux1.site:35977] > master.ServerManager(660): Waiting on regionserver(s) count to settle; > currently=3 > 2011-07-06 09:12:57,831 INFO [Master:0;linux1.site:35977] > master.ServerManager(648): Finished waiting for regionserver count to > settle; count=3, sleptFor=4500 > 2011-07-06 09:12:57,831 INFO [Master:0;linux1.site:35977] > master.ServerManager(674): Exiting wait on regionserver(s) to checkin; > count=3, stopped=false, count of regions out on cluster=22 > 2011-07-06 09:12:57,834 INFO [Master:0;linux1.site:35977] > master.MasterFileSystem(180): Log folder > hdfs://localhost:41078/user/root/.logs/linux1.site,54949,1309914772108 > doesn't belong to a known region server, splitting > > But after master starting split, another RS connected: > 2011-07-06 09:13:54,243 INFO > [RegionServer:3;linux1.site,54949,1309914772108] > regionserver.HRegionServer(1456): Attempting connect to Master server at > linux1.site:35977 > 2011-07-06 09:13:54,243 INFO > [RegionServer:3;linux1.site,54949,1309914772108] > regionserver.HRegionServer(1475): Connected to master at linux1.site:35977 > > Then, split log recover lease may encounter AlreadyBeingCreatedException > and show this log: > 2011-07-06 09:13:57,929 WARN [Master:0;linux1.site:35977] > util.FSUtils(715): Waited 60087ms for lease recovery on > hdfs://localhost:41078/user/root/.logs/linux1.site,54949,1309914772108/linux1.site%3A54949.1309914772175:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: > failed to create file > /user/root/.logs/linux1.site,54949,1309914772108/linux1.site%3A54949.1309914772175 > for DFSClient_hb_m_linux1.site:35977_1309914773252 on client 127.0.0.1, > because this file is already being created by > DFSClient_hb_rs_linux1.site,54949,1309914772108_1309914772161 on 127.0.0.1 > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:1202) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLease(FSNamesystem.java:1157) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.recoverLease(NameNode.java:404) > at sun.reflect.GeneratedMethodAccessor36.invoke(Unknown > Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) > at > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961) > at > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957) > at java.security.AccessController.doPrivileged(Native > Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955) > > This log shows continuing about 14 minutes and test fail. > > This test fail occasionally may because the master default waiting time is > 4500ms, usually it's enouth for all the RS to check in, but some times it's > not, and the RS check in later may disturb the recover lease. > This may be a bug, And may have some relation to HBASE-4177. > > > Zhou Shuaifeng(Frank) > > >
