We use HBASE 0.20.6 with HBASE-2473 I think we may have hit HBASE-2599 I am looking at 2599-0.20.txt<https://issues.apache.org/jira/secure/attachment/12445536/2599-0.20.txt>which you attached to the JIRA.
I cannot find how to apply this change for HRegionServer.java: - serverInfo.setStartCode(System.currentTimeMillis()); + this.serverInfo = + createServerInfoWithNewStartCode(this.serverInfo); I only found one call of the following form at line 776 in protected void init(final MapWritable c): this.hlogFlusher.setHLog(hlog) ; If someone can help me apply the patch, that would be great. On Fri, Aug 13, 2010 at 5:36 PM, Jean-Daniel Cryans <[email protected]>wrote: > Ah very helpful, see how .META. is getting reassigned even if it has a > valid assignment? Some environments get this for some reason, and this > is fixed by https://issues.apache.org/jira/browse/HBASE-2599 which you > will need to apply on your hbase. > > J-D > > On Fri, Aug 13, 2010 at 5:22 PM, Marchwiak, Patrick D. > <[email protected]> wrote: > > I've attached the log. > > > > One more thing I'll add is that the the stop-hbase.sh script hangs hangs > on > > the "stopping master..." line so I had to manually kill the Hmaster > process > > before doing a restart. > > > > On 8/13/10 5:00 PM, "Jean-Daniel Cryans" <[email protected]> wrote: > > > >> A clean log of a full master startup would be really useful, can't > >> tell much more by the current info you provided. > >> > >> J-D > >> > >> On Fri, Aug 13, 2010 at 4:50 PM, Marchwiak, Patrick D. > >> <[email protected]> wrote: > >>> I am having issues performing any operations (list/create/put) on my > hbase > >>> instance once it starts up. > >>> > >>> The environment: > >>> Red Hat 5.5 > >>> Hadoop 0.20.2 > >>> HBase 0.20.4 > >>> java 1.6.0_20 > >>> 1 running master > >>> 23 running regionserver + 3 also running zookeeper > >>> > >>> When attemting to do a list from the hbase shell it returns this error: > >>> NativeException: org.apache.hadoop.hbase.MasterNotRunningException: > null > >>> > >>> When attempting to perform inserts from a hadoop job I see the > following > >>> error in my application: > >>> > >>> 2010-08-13 14:03:22.207 INFO [main] JobClient:1317 Task Id : > >>> attempt_201006091333_0031_m_000000_0, Status : FAILED > >>> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out > trying > >>> to locate root region > >>> at > >>> > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootReg > >>> ion(HConnectionManager.java:930) > >>> at > >>> > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion( > >>> HConnectionManager.java:581) > >>> at > >>> > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegio > >>> n(HConnectionManager.java:563) > >>> at > >>> > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionI > >>> nMeta(HConnectionManager.java:694) > >>> at > >>> > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion( > >>> HConnectionManager.java:590) > >>> at > >>> > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegio > >>> n(HConnectionManager.java:563) > >>> at > >>> > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionI > >>> nMeta(HConnectionManager.java:694) > >>> at > >>> > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion( > >>> HConnectionManager.java:594) > >>> at > >>> > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion( > >>> HConnectionManager.java:557) > >>> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:127) > >>> ... > >>> > >>> Now contrary to what the shell is reporting, the HMaster process is > >>> definitely running (along with HRegionServer and HQuorumPeer on the > >>> appropriate other nodes in the cluster). I do not see any errors in the > >>> master log, though interestingly I noticed a log message mentioning > only 7 > >>> region servers - in fact there are more than twice that many in the > cluster. > >>> > >>> 2010-08-13 14:04:32,018 INFO > org.apache.hadoop.hbase.master.ServerManager: 7 > >>> region servers, 0 dead, average load 3.142857142857143 > >>> > >>> The last clue I have is some exceptions in the zookeeper logs: > >>> > >>> 2010-08-13 13:34:16,041 WARN > >>> org.apache.zookeeper.server.PrepRequestProcessor: Got exception when > >>> processing sessionid:0x12a6d2847e40000 type:create cxid:0x28 > >>> zxid:0xfffffffffffffffe txntype:unknown n/a > >>> org.apache.zookeeper.KeeperException$NodeExistsException: > KeeperErrorCode = > >>> NodeExists > >>> at > >>> > org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcess > >>> or.java:245) > >>> at > >>> > org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.ja > >>> va:114) > >>> 2010-08-13 14:05:08,782 INFO org.apache.zookeeper.server.NIOServerCnxn: > >>> Connected to /128.115.210.161:35883 lastZxid 0 > >>> 2010-08-13 14:05:08,782 INFO org.apache.zookeeper.server.NIOServerCnxn: > >>> Creating new session 0x12a6d2847e40001 > >>> 2010-08-13 14:05:08,800 INFO org.apache.zookeeper.server.NIOServerCnxn: > >>> Finished init of 0x12a6d2847e40001 valid:true > >>> 2010-08-13 14:05:08,802 WARN > >>> org.apache.zookeeper.server.PrepRequestProcessor: Got exception when > >>> processing sessionid:0x12a6d2847e40001 type:create cxid:0x1 > >>> zxid:0xfffffffffffffffe txntype:unknown n/a > >>> org.apache.zookeeper.KeeperException$NodeExistsException: > KeeperErrorCode = > >>> NodeExists > >>> at > >>> > org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcess > >>> or.java:245) > >>> at > >>> > org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.ja > >>> va:114) > >>> 2010-08-13 14:05:09,762 WARN org.apache.zookeeper.server.NIOServerCnxn: > >>> Exception causing close of session 0x12a6d2847e40001 due to > >>> java.io.IOException: Read error > >>> 2010-08-13 14:05:09,763 INFO org.apache.zookeeper.server.NIOServerCnxn: > >>> closing session:0x12a6d2847e40001 NIOServerCnxn: > >>> java.nio.channels.SocketChannel[connected local=/128.115.210.149:2181 > >>> remote=/128.115.210.161:35883] > >>> > >>> HBase was running on this cluster a few months ago so I doubt it is a > >>> blatant misconfiguration at fault. I've tried restarting everything > hbase or > >>> hadoop related as well as wiping out the hbase data directory on hdfs > to > >>> start fresh with no result. Any hints or suggestions as to what the > problem > >>> might be are greatly appreciated. Thanks! > >>> > >>> > >>> > >>> > >>> > >>> > >>> > > > > >
