start... cluster using new master

HBase Review Board (JIRA) Fri, 17 Sep 2010 16:48:01 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910855#action_12910855
 ]


HBase Review Board commented on HBASE-3010:
-------------------------------------------

Message from: "Todd Lipcon" <[email protected]>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/873/#review1267
-----------------------------------------------------------

Ship it!



src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
<http://review.cloudera.org/r/873/#comment4312>

    hrm, I guess that's a good idea, but something seems a little strange about 
this :)



src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<http://review.cloudera.org/r/873/#comment4313>

    this should probably move down until after we're the active master


- Todd





> Can't start/stop/start... cluster using new master
> --------------------------------------------------
>
>                 Key: HBASE-3010
>                 URL: https://issues.apache.org/jira/browse/HBASE-3010
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.90.0
>
>
> Currently you might start a small cluster the first time on TRUNK -- i.e. new 
> master -- but second time you do the startup you run into a couple of 
> interesting issues:
> + The old root-region-location is still in place. It gets cleaned later but 
> for a while on startup it does not have the 'right' address.
> + Regionserver (or a client) on startup creates a catalogtracker, a class 
> that notices changes in meta tables keeping up catalog table locations.  
> Starting the catalogtracker results in a check for current catalog locations. 
>  As part of this process, since root-region-location "exists", catalogtracker 
> tries to verify root's location by doing a noop against root host, only, to 
> do this it needs to do the initial rpc proxy setup.  It can so happen that 
> the old root address was that of the current regionserver trying to 
> initialize so we'll be trying to connect to ourself to verify root location 
> ONLY, we're doing this before we've setup the rpcserver and handlers -- so we 
> block, and as it happens there is no timeout on proxy setup (Todd ran into 
> this yesterday, I ran into it today -- its easy to manufacture).
> + So regionserver can't progress.  Meantime the master can't progress because 
> there are no regionservers checking in.  And you can't shut it down because 
> we're not looking at the right 'stop' flag

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-3010) Can't start/stop/start... cluster using new master

Reply via email to