[ https://issues.apache.org/jira/browse/HBASE-3147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924602#action_12924602 ]
stack commented on HBASE-3147: ------------------------------ I got this when I tried running patch.... {code} java.lang.IllegalAccessError: tried to access method org.apache.hadoop.hbase.zookeeper.ZKAssign.getNodeName(Lorg/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher;Ljava/lang/String;)Ljava/lang/String; from class org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor at org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor.chore(AssignmentManager.java:1457) at org.apache.hadoop.hbase.Chore.run(Chore.java:66) 2010-10-25 16:07:44,354 INFO org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor: sv2borg180:60000.timeoutMonitor exiting {code} Let me try fix. > Regions stuck in transition after rolling restart, perpetual timeout handling > but nothing happens > ------------------------------------------------------------------------------------------------- > > Key: HBASE-3147 > URL: https://issues.apache.org/jira/browse/HBASE-3147 > Project: HBase > Issue Type: Bug > Reporter: stack > Fix For: 0.90.0 > > > The rolling restart script is great for bringing on the weird stuff. On my > little loaded cluster if I run it, it horks the cluster and it doesn't > recover. I notice two issues that need fixing: > 1. We'll miss noticing that a server was carrying .META. and it never gets > assigned -- the shutdown handlers get stuck in perpetual wait on a .META. > assign that will never happen. > 2. Perpetual cycling of the this sequence per region not succesfully assigned: > {code} > 2010-10-23 21:37:57,404 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed > out: usertable,user510588360,1287547556587.7f2d92497d2d03917afd574ea2aca55b. > state=PENDING_OPEN, ts=1287869814294 45154 2010-10-23 > 21:37:57,404 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region > has been PENDING_OPEN or OPENING for too long, reassigning > region=usertable,user510588360,1287547556587. > 7f2d92497d2d03917afd574ea2aca55b. 45155 2010-10-23 21:37:57,404 DEBUG > org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x2bd57d1475046a > Attempting to transition node 7f2d92497d2d03917afd574ea2aca55b from > RS_ZK_REGION_OPENING to M_ZK_REGION_OFFLINE 45156 2010-10-23 21:37:57,404 > WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:60000-0x2bd57d1475046a Attempt to transition the unassigned node for > 7f2d92497d2d03917afd574ea2aca55b from RS_ZK_REGION_OPENING to > M_ZK_REGION_OFFLINE failed, the node existed but was in the state > M_ZK_REGION_OFFLINE 45157 2010-10-23 21:37:57,404 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Region transitioned OPENING > to OFFLINE so skipping timeout, > region=usertable,user510588360,1287547556587.7f2d92497d2d03917afd574ea2aca55b. > > ,,, > {code} > Timeout period again elapses an then same sequence. > This is what I've been working on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.