[ 
https://issues.apache.org/jira/browse/HBASE-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932814#action_12932814
 ] 

Todd Lipcon commented on HBASE-3243:
------------------------------------

Here are some relevant logs... the region in question is 
user1072135606,1289903283192.8a2c525797cad9fd9882cc6304362bd1

Here's the birth of the region on haus01:
2010-11-16 02:28:10,471 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Instantiated 
usertable,user1072135606,1289903283192.8a2c525797cad9fd9882cc6304362bd1.
2010-11-16 02:28:11,432 INFO 
org.apache.hadoop.hbase.regionserver.CompactSplitThread: Region split, META 
updated, and report to master. 
Parent=usertable,user1057689679,1289901563182.24d8aec47045a86b97b3508ed16ced29.,
 new regions: 
usertable,user1057689679,1289903283192.b313a3864e1eb1aaf2a72dcf2c2814b3., 
usertable,user1072135606,1289903283192.8a2c525797cad9fd9882cc6304362bd1.. Split 
took 8sec

Master gets split:
2010-11-16 02:28:11,450 INFO org.apache.hadoop.hbase.master.ServerManager: 
Received REGION_SPLIT: 
usertable,user1057689679,1289901563182.24d8aec47045a86b97b3508ed16ced29.: 
Daughters; 
usertable,user1057689679,1289903283192.b313a3864e1eb1aaf2a72dcf2c2814b3., 
usertable,user1072135606,1289903283192.8a2c525797cad9fd9882cc6304362bd1. from 
haus01.sf.cloudera.com,60020,1289890926766

.. then plenty of successful flushes/compactions on haus01, no mention on any 
other region server. The benchmark finished around 3:20am, and no mention of 
this region in any logs from that point until I disabled the table this evening:

Master says:
2010-11-16 18:16:17,223 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Starting unassignment of region 
usertable,user1072135606,1289903283192.8a2c525797cad9fd9882cc6304362bd1. 
(offlining)

Region server *haus02* (NOTE: this is the wrong regionserver!!) says:
2010-11-16 18:16:17,238 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region: 
usertable,user1072135606,1289903283192.8a2c525797cad9fd9882cc6304362bd1.
2010-11-16 18:16:17,239 WARN 
org.apache.hadoop.hbase.regionserver.HRegionServer: Received close for region 
we are not serving; 8a2c525797cad9fd9882cc6304362bd1

Master says:
2010-11-16 18:16:17,240 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Attempted to send CLOSE to 
serverName=haus02.sf.cloudera.com,60020,1289890927217, load=(requests=1029, 
regions=28, usedHeap=4772, maxHeap=8185) for region 
usertable,user1072135606,1289903283192.8a2c525797cad9fd9882cc6304362bd1. but 
failed, setting region as OFFLINE and reassigning
...
2010-11-16 18:16:17,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Forcing OFFLINE; 
was=usertable,user1072135606,1289903283192.8a2c525797cad9fd9882cc6304362bd1. 
state=PENDING_CLOSE, ts=1289960177227
2010-11-16 18:16:17,248 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Table usertable disabling; skipping assign of 
usertable,user1072135606,1289903283192.8a2c525797cad9fd9882cc6304362bd1.
2010-11-16 18:16:17,248 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:60000-0x12c537d84e10000 Deleting existing unassigned node for 
8a2c525797cad9fd9882cc6304362bd1 that is in expected state RS_ZK_REGION_CLOSED
2010-11-16 18:16:17,248 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
master:60000-0x12c537d84e10000 Unable to get data of znode 
/hbase/unassigned/8a2c525797cad9fd9882cc6304362bd1 because node does not exist 
(not necessarily an error)


So basically, the master randomly got the wrong regionserver for this region, 
and there's little indication why.

> Disable Table closed region on wrong host
> -----------------------------------------
>
>                 Key: HBASE-3243
>                 URL: https://issues.apache.org/jira/browse/HBASE-3243
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Priority: Blocker
>             Fix For: 0.90.0
>
>
> I ran some YCSB benchmarks which resulted in about 150 regions worth of data 
> overnight. Then I disabled the table, and the master for some reason closed 
> one region on the wrong server. The server ignored this, but the region 
> remained open on a different server, which later flipped out when it tried to 
> flush due to hlog accumulation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to