[ 
https://issues.apache.org/jira/browse/HBASE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917042#action_12917042
 ] 

HBase Review Board commented on HBASE-3068:
-------------------------------------------

Message from: [email protected]

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/930/
-----------------------------------------------------------

(Updated 2010-10-01 14:05:45.726776)


Review request for hbase and Jonathan Gray.


Changes
-------

Update to javadoc and comments


Summary
-------

Fix is two-fold.

First, added new facility where on successful open, we go and update the timers 
on all regions in transition that were on the same server.

Secondly, in the timeout monitor, we'll do necessary cleanup and state 
transitions so that when we go into re-assign, we have the proper state

M src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
  Changed regionPlans to be a concurrentskiplist.  Makes life easier
  and in no place do we need lock on regionPlans to span other than
  regionPlans changes.
  Added to the processing of successful region open, the cleanup
  of its regionPlan and a run of updateTimers.
  Put setOffline in place of some code that duplicated what it did.


This addresses bug hbase-3068.
    http://issues.apache.org/jira/browse/hbase-3068


Diffs (updated)
-----

  trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 
1003330 

Diff: http://review.cloudera.org/r/930/diff


Testing
-------

Basic unit tests seem to be passing.  Testing now up on cluster.


Thanks,

stack




> IllegalStateException when new server comes online, is given 200 regions to 
> open and 200th region gets timed out of regions in transition
> -----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3068
>                 URL: https://issues.apache.org/jira/browse/HBASE-3068
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>             Fix For: 0.90.0
>
>
> Yesterday we committed a change that makes it so the master will crash is a 
> zk transition that is unexpected.   Its extreme but good for highlighting bad 
> state changes (we also started marking these as illegalstateexceptions 
> yesterday too).
> So, testing new master I brought up a new server.  Balancer tried to give new 
> server 256 regions.
> {code}
> 2010-10-01 16:01:42,972 INFO org.apache.hadoop.hbase.master.LoadBalancer: 
> Calculated a load balance in 0ms. Moving 256 regions off of 7 overloaded 
> servers onto 1 less loaded servers
> {code}
> Turns out we failed complete open of all 256 servers within the 
> regions-in-transition timeout period so we tried to reassign.  The master 
> aborted because region was in the PENDING_OPEN state when we went about 
> assigning.
> {code}
> 2010-10-01 16:02:28,809 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  
> usertable,user1128734802,1285701924906.006696a9bf346f8593df66728e18e029. 
> state=PENDING_OPEN, ts=1285948921051
> 2010-10-01 16:02:28,809 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
> PENDING_OPEN or OPENING for too long, reassigning 
> region=usertable,user1128734802,1285701924906.006696a9bf346f8593df66728e18e029.
> 2010-10-01 16:02:28,811 FATAL org.apache.hadoop.hbase.master.HMaster: 
> Unexpected state trying to OFFLINE; 
> usertable,user1128734802,1285701924906.006696a9bf346f8593df66728e18e029. 
> state=PENDING_OPEN, ts=1285948921051
> java.lang.IllegalStateException
>     at 
> org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:662)
>     at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:632)
>     at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:560)
>     at 
> org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor.chore(AssignmentManager.java:1102)
>     at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to