start... cluster using new master

HBase Review Board (JIRA) Fri, 17 Sep 2010 18:07:56 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910884#action_12910884
 ]


HBase Review Board commented on HBASE-3010:
-------------------------------------------

Message from: [email protected]


bq.  On 2010-09-17 16:25:15, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java, 
line 142
bq.  > <http://review.cloudera.org/r/873/diff/1/?file=11929#file11929line142>
bq.  >
bq.  >     hrm, I guess that's a good idea, but something seems a little 
strange about this :)

Yeah, this is a little 'bold' but trying to think around it, i couldn't see 
issue w/ it, whereas not doing it is going to frustrate as restart will have 
this minute or so stall while we waiting on the znode expire.  I'd say its good 
for now and I suppose we'll see later if it becomes a prop.


bq.  On 2010-09-17 16:25:15, Todd Lipcon wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 222
bq.  > <http://review.cloudera.org/r/873/diff/1/?file=11931#file11931line222>
bq.  >
bq.  >     this should probably move down until after we're the active master

sure... will do on commit.


- stack


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/873/#review1267
-----------------------------------------------------------





> Can't start/stop/start... cluster using new master
> --------------------------------------------------
>
>                 Key: HBASE-3010
>                 URL: https://issues.apache.org/jira/browse/HBASE-3010
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.0
>
>
> Currently you might start a small cluster the first time on TRUNK -- i.e. new 
> master -- but second time you do the startup you run into a couple of 
> interesting issues:
> + The old root-region-location is still in place. It gets cleaned later but 
> for a while on startup it does not have the 'right' address.
> + Regionserver (or a client) on startup creates a catalogtracker, a class 
> that notices changes in meta tables keeping up catalog table locations.  
> Starting the catalogtracker results in a check for current catalog locations. 
>  As part of this process, since root-region-location "exists", catalogtracker 
> tries to verify root's location by doing a noop against root host, only, to 
> do this it needs to do the initial rpc proxy setup.  It can so happen that 
> the old root address was that of the current regionserver trying to 
> initialize so we'll be trying to connect to ourself to verify root location 
> ONLY, we're doing this before we've setup the rpcserver and handlers -- so we 
> block, and as it happens there is no timeout on proxy setup (Todd ran into 
> this yesterday, I ran into it today -- its easy to manufacture).
> + So regionserver can't progress.  Meantime the master can't progress because 
> there are no regionservers checking in.  And you can't shut it down because 
> we're not looking at the right 'stop' flag

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-3010) Can't start/stop/start... cluster using new master

Reply via email to