[ 
https://issues.apache.org/jira/browse/HBASE-2781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans reopened HBASE-2781:
---------------------------------------


Hate to do this but this patch was missing a few things. TestReplication failed 
again, for the same reason, because some parts of RegionManager are still 
calling createUnassignedRegion instead of createOrUpdateUnassignedRegion. 
Should we just redirect all the calls to the latter and delete the former?

> ZKW.createUnassignedRegion doesn't make sure existing znode is in the right 
> state
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-2781
>                 URL: https://issues.apache.org/jira/browse/HBASE-2781
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Assignee: Karthik Ranganathan
>            Priority: Critical
>             Fix For: 0.90.0
>
>         Attachments: HBASE-2781-0.21.patch
>
>
> In ZKW.createUnassignedRegion I see this comment:
> {code}
>       // check if this node already exists - 
>       //   - it should not exist
>       //   - if it does, it should be in the CLOSED state
> {code}
> And what I got is:
> {noformat}
> 2010-06-23 15:42:05,823 INFO  [IPC Server handler 3 on 60362] 
> master.ServerManager(457): Processing MSG_REPORT_PROCESS_OPEN: 
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. from 
> h136.sfo.stumble.net,60365,1277332849712; 1 of 4
> 2010-06-23 15:42:05,867 INFO  [RegionServer:1.worker] 
> regionserver.HRegionServer$Worker(1338): Worker: MSG_REGION_OPEN: 
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:05,870 DEBUG [RegionServer:1.worker] 
> regionserver.RSZookeeperUpdater(157): Updating ZNode 
> /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 with [RS2ZK_REGION_OPENING] 
> expected version = 0
> 2010-06-23 15:42:05,871 DEBUG [main-EventThread] master.HMaster(1158): Event 
> NodeDataChanged with state SyncConnected with path 
> /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:05,871 DEBUG [main-EventThread] 
> master.ZKMasterAddressWatcher(64): Got event NodeDataChanged with path 
> /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:05,871 DEBUG [main-EventThread] 
> master.ZKUnassignedWatcher(95): ZK-EVENT-PROCESS: Got zkEvent NodeDataChanged 
> state:SyncConnected path:/1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:05,872 INFO  [main-EventThread] 
> regionserver.HRegionServer(379): Got ZooKeeper event, state: SyncConnected, 
> type: NodeDataChanged, path: /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:05,872 DEBUG [MASTER_OPENREGION-10.10.1.136:60362-1] 
> handler.MasterOpenRegionHandler(77): Event = RS2ZK_REGION_OPENING, region = 
> 13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:05,874 DEBUG [RegionServer:1.worker] 
> regionserver.HRegion(297): Creating region 
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:06,154 INFO  [RegionServer:1.worker] 
> regionserver.HRegion(366): Onlined 
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.; next sequenceid=1
> 2010-06-23 15:42:06,154 DEBUG [RegionServer:1.worker] 
> regionserver.RSZookeeperUpdater(157): Updating ZNode 
> /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 with [RS2ZK_REGION_OPENED] 
> expected version = 1\
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:06,249 ERROR [RegionServer:1.worker] 
> regionserver.HRegionServer(1488): Failed to mark region 
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. as opened
> java.io.IOException: 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for 
> /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:06,993 DEBUG [RegionServer:1] 
> regionserver.HRegionServer(1569): closing region 
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:06,993 DEBUG [RegionServer:1] regionserver.HRegion(487): 
> Closing test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.: disabling 
> compactions & flushes
> 2010-06-23 15:42:06,993 DEBUG [RegionServer:1] regionserver.HRegion(512): 
> Updates disabled for region, no outstanding scanners on 
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:06,993 DEBUG [RegionServer:1] regionserver.HRegion(519): No 
> more row locks outstanding on region 
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:06,994 INFO  [RegionServer:1] regionserver.HRegion(531): 
> Closed test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:09,105 INFO  [master] master.ProcessServerShutdown(126): 
> Region test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. was in 
> transition 
> name=test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464., 
> state=PENDING_OPEN on dead server h136.sfo.stumble.net,60365,1277332849712 - 
> marking unassigned
> 2010-06-23 15:42:10,065 INFO  [IPC Server handler 2 on 60362] 
> master.RegionManager(340): Assigning region 
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. to 
> h136.sfo.stumble.net,60363,1277332849671
> 2010-06-23 15:42:10,067 DEBUG [IPC Server handler 2 on 60362] 
> zookeeper.ZooKeeperWrapper(1079): While creating UNASSIGNED region 
> 13bef4950ac6827ac32d87682b8b2464 exists, state = RS2ZK_REGION_OPENING
> 2010-06-23 15:42:10,126 WARN  [IPC Server handler 2 on 60362] 
> zookeeper.ZooKeeperWrapper(1024): 
> <localhost:/1,org.apache.hadoop.hbase.master.HMaster>Failed to create ZNode 
> /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 in ZooKeeper
> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
> NodeExists for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:10,127 DEBUG [IPC Server handler 2 on 60362] 
> master.RegionManager(350): Created UNASSIGNED zNode 
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. in state 
> M2ZK_REGION_OFFLINE
> 2010-06-23 15:42:10,245 INFO  [RegionServer:0] 
> regionserver.HRegionServer(511): MSG_REGION_OPEN: 
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:11,248 INFO  [IPC Server handler 1 on 60362] 
> master.ServerManager(457): Processing MSG_REPORT_PROCESS_OPEN: 
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. from 
> h136.sfo.stumble.net,60363,1277332849671; 7 of 13
> 2010-06-23 15:42:13,795 INFO  [RegionServer:0.worker] 
> regionserver.HRegionServer$Worker(1338): Worker: MSG_REGION_OPEN: 
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> 2010-06-23 15:42:13,797 ERROR [RegionServer:0.worker] 
> regionserver.RSZookeeperUpdater(107): ZNode 
> /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 is not in CLOSED/OFFLINE state 
> (state = RS2ZK_REGION_OPENING), will NOT open region.
> 2010-06-23 15:42:13,798 ERROR [RegionServer:0.worker] 
> regionserver.HRegionServer(814): Error opening 
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> java.io.IOException: ZNode /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 is 
> not in CLOSED/OFFLINE state (state = RS2ZK_REGION_OPENING), will NOT open 
> region.
> 2010-06-23 15:42:13,800 ERROR [RegionServer:0.worker] 
> regionserver.RSZookeeperUpdater(141): Aborting open of region 
> 13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:13,800 DEBUG [RegionServer:0.worker] 
> regionserver.RSZookeeperUpdater(157): Updating ZNode 
> /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 with [RS2ZK_REGION_CLOSED] 
> expected version = 0
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = 
> BadVersion for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> 2010-06-23 15:42:13,802 ERROR [RegionServer:0.worker] 
> regionserver.HRegionServer(1473): Failed to abort open region 
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.
> java.io.IOException: 
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = 
> BadVersion for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> Caused by: org.apache.zookeeper.KeeperException$BadVersionException: 
> KeeperErrorCode = BadVersion for 
> /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464
> {noformat}
> Basically:
>  # A region server was opening the region
>  # It was expired just before reporting that the region is opened, leaving 
> the znode in the state RS2ZK_REGION_OPENING
>  # The region gets reassigned, it sees that state, doesn't change it, but 
> still outputs in the end "Created UNASSIGNED zNode 
> test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. in state 
> M2ZK_REGION_OFFLINE"
>  # When the region server opens the region, it sees that the state is wrong 
> and aborts opening the region
> I think that the way to fix it is to change the state to what it should be.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to