[
https://issues.apache.org/jira/browse/HBASE-2781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883239#action_12883239
]
Karthik Ranganathan commented on HBASE-2781:
--------------------------------------------
Just wanted to update - working on the test case for this, will upload patch
along with the JUnit test.
> ZKW.createUnassignedRegion doesn't make sure existing znode is in the right
> state
> ---------------------------------------------------------------------------------
>
> Key: HBASE-2781
> URL: https://issues.apache.org/jira/browse/HBASE-2781
> Project: HBase
> Issue Type: Bug
> Reporter: Jean-Daniel Cryans
> Assignee: Karthik Ranganathan
> Priority: Critical
> Fix For: 0.21.0
>
>
> In ZKW.createUnassignedRegion I see this comment:
> {code}
> // check if this node already exists -
> // - it should not exist
> // - if it does, it should be in the CLOSED state
> {code}
> And what I got is:
> {noformat}
> 2010-06-23 15:42:05,823 INFO [IPC Server handler 3 on 60362]
> master.ServerManager(457): Processing MSG_REPORT_PROCESS_OPEN:
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. from
> h136.sfo.stumble.net,60365,1277332849712; 1 of 4
> 2010-06-23 15:42:05,867 INFO [RegionServer:1.worker]
> regionserver.HRegionServer$Worker(1338): Worker: MSG_REGION_OPEN:
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:05,870 DEBUG [RegionServer:1.worker]
> regionserver.RSZookeeperUpdater(157): Updating ZNode
> /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 with [RS2ZK_REGION_OPENING]
> expected version = 0
> 2010-06-23 15:42:05,871 DEBUG [main-EventThread] master.HMaster(1158): Event
> NodeDataChanged with state SyncConnected with path
> /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:05,871 DEBUG [main-EventThread]
> master.ZKMasterAddressWatcher(64): Got event NodeDataChanged with path
> /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:05,871 DEBUG [main-EventThread]
> master.ZKUnassignedWatcher(95): ZK-EVENT-PROCESS: Got zkEvent NodeDataChanged
> state:SyncConnected path:/1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:05,872 INFO [main-EventThread]
> regionserver.HRegionServer(379): Got ZooKeeper event, state: SyncConnected,
> type: NodeDataChanged, path: /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:05,872 DEBUG [MASTER_OPENREGION-10.10.1.136:60362-1]
> handler.MasterOpenRegionHandler(77): Event = RS2ZK_REGION_OPENING, region =
> 13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:05,874 DEBUG [RegionServer:1.worker]
> regionserver.HRegion(297): Creating region
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:06,154 INFO [RegionServer:1.worker]
> regionserver.HRegion(366): Onlined
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.; next sequenceid=1
> 2010-06-23 15:42:06,154 DEBUG [RegionServer:1.worker]
> regionserver.RSZookeeperUpdater(157): Updating ZNode
> /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 with [RS2ZK_REGION_OPENED]
> expected version = 1\
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode
> = ConnectionLoss for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:06,249 ERROR [RegionServer:1.worker]
> regionserver.HRegionServer(1488): Failed to mark region
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. as opened
> java.io.IOException:
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode
> = ConnectionLoss for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for
> /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:06,993 DEBUG [RegionServer:1]
> regionserver.HRegionServer(1569): closing region
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:06,993 DEBUG [RegionServer:1] regionserver.HRegion(487):
> Closing test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.: disabling
> compactions & flushes
> 2010-06-23 15:42:06,993 DEBUG [RegionServer:1] regionserver.HRegion(512):
> Updates disabled for region, no outstanding scanners on
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:06,993 DEBUG [RegionServer:1] regionserver.HRegion(519): No
> more row locks outstanding on region
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:06,994 INFO [RegionServer:1] regionserver.HRegion(531):
> Closed test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:09,105 INFO [master] master.ProcessServerShutdown(126):
> Region test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. was in
> transition
> name=test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.,
> state=PENDING_OPEN on dead server h136.sfo.stumble.net,60365,1277332849712 -
> marking unassigned
> 2010-06-23 15:42:10,065 INFO [IPC Server handler 2 on 60362]
> master.RegionManager(340): Assigning region
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. to
> h136.sfo.stumble.net,60363,1277332849671
> 2010-06-23 15:42:10,067 DEBUG [IPC Server handler 2 on 60362]
> zookeeper.ZooKeeperWrapper(1079): While creating UNASSIGNED region
> 13bef4950ac6827ac32d87682b8b2464 exists, state = RS2ZK_REGION_OPENING
> 2010-06-23 15:42:10,126 WARN [IPC Server handler 2 on 60362]
> zookeeper.ZooKeeperWrapper(1024):
> <localhost:/1,org.apache.hadoop.hbase.master.HMaster>Failed to create ZNode
> /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 in ZooKeeper
> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode =
> NodeExists for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:10,127 DEBUG [IPC Server handler 2 on 60362]
> master.RegionManager(350): Created UNASSIGNED zNode
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. in state
> M2ZK_REGION_OFFLINE
> 2010-06-23 15:42:10,245 INFO [RegionServer:0]
> regionserver.HRegionServer(511): MSG_REGION_OPEN:
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:11,248 INFO [IPC Server handler 1 on 60362]
> master.ServerManager(457): Processing MSG_REPORT_PROCESS_OPEN:
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. from
> h136.sfo.stumble.net,60363,1277332849671; 7 of 13
> 2010-06-23 15:42:13,795 INFO [RegionServer:0.worker]
> regionserver.HRegionServer$Worker(1338): Worker: MSG_REGION_OPEN:
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:13,797 ERROR [RegionServer:0.worker]
> regionserver.RSZookeeperUpdater(107): ZNode
> /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 is not in CLOSED/OFFLINE state
> (state = RS2ZK_REGION_OPENING), will NOT open region.
> 2010-06-23 15:42:13,798 ERROR [RegionServer:0.worker]
> regionserver.HRegionServer(814): Error opening
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> java.io.IOException: ZNode /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 is
> not in CLOSED/OFFLINE state (state = RS2ZK_REGION_OPENING), will NOT open
> region.
> 2010-06-23 15:42:13,800 ERROR [RegionServer:0.worker]
> regionserver.RSZookeeperUpdater(141): Aborting open of region
> 13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:13,800 DEBUG [RegionServer:0.worker]
> regionserver.RSZookeeperUpdater(157): Updating ZNode
> /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 with [RS2ZK_REGION_CLOSED]
> expected version = 0
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
> BadVersion for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:13,802 ERROR [RegionServer:0.worker]
> regionserver.HRegionServer(1473): Failed to abort open region
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> java.io.IOException:
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
> BadVersion for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> Caused by: org.apache.zookeeper.KeeperException$BadVersionException:
> KeeperErrorCode = BadVersion for
> /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> {noformat}
> Basically:
> # A region server was opening the region
> # It was expired just before reporting that the region is opened, leaving
> the znode in the state RS2ZK_REGION_OPENING
> # The region gets reassigned, it sees that state, doesn't change it, but
> still outputs in the end "Created UNASSIGNED zNode
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. in state
> M2ZK_REGION_OFFLINE"
> # When the region server opens the region, it sees that the state is wrong
> and aborts opening the region
> I think that the way to fix it is to change the state to what it should be.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.