[jira] Commented: (HBASE-3047) If new master crashes, restart is messy

HBase Review Board (JIRA) Tue, 28 Sep 2010 21:02:07 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916007#action_12916007
 ]


HBase Review Board commented on HBASE-3047:
-------------------------------------------

Message from: [email protected]

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/915/#review1353
-----------------------------------------------------------


Here's a few comments on yours.

Actually, testing this patch on cluster brought up some issues.  I think I 
should recast.  I have some ideas on how.  v2 coming.  Will incorporate your 
belows.


trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<http://review.cloudera.org/r/915/#comment4495>

    I can change it (you get my intent but it still confused so I should change 
it).



trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<http://review.cloudera.org/r/915/#comment4496>

    Yeah, what you say.  Let me fix up comments.



trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<http://review.cloudera.org/r/915/#comment4497>

    will do


- stack





> If new master crashes, restart is messy
> ---------------------------------------
>
>                 Key: HBASE-3047
>                 URL: https://issues.apache.org/jira/browse/HBASE-3047
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.90.0
>
>         Attachments: 3047.txt
>
>
> If master crashes, the cluster-is-up flag is left stuck on.
> On restart of cluster, regionservers may come up before the master.  They'll 
> have registered themselves in zk by time the master assumes its role and 
> master will think its joining an up and running cluster when in fact this is 
> a fresh startup.  Other probs. are that there'll be a root region that is bad 
> up in zk.  Same for meta and at moment we're not handling bad root and meta 
> very well.
> Here's sample of kinda of issues we're running into:
> {code}
> 2010-09-25 23:53:13,938 FATAL org.apache.hadoop.hbase.master.HMaster:
> Unhandled exception. Starting shutdown.
> java.io.IOException: Call to /10.20.20.188:60020 failed on local
> exception: java.io.IOException: Connection reset by peer
>    at 
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:781)
>    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
>    at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:255)
>    at $Proxy1.getProtocolVersion(Unknown Source)
>    at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:412)
>    at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:388)
>    at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:435)
>    at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:345)
>    at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:889)
>    at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:350)
>    at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.getRootServerConnection(CatalogTracker.java:209)
>    at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:241)
>    at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:286)
>    at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:326)
>    at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:157)
>    at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:140)
>    at 
> org.apache.hadoop.hbase.master.AssignmentManager.rebuildUserRegions(AssignmentManager.java:753)
>    at 
> org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:174)
>    at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:314)
> Caused by: java.io.IOException: Connection reset by peer
>    at sun.nio.ch.FileDispatcher.read0(Native Method)
>    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
>    at sun.nio.ch.IOUtil.read(IOUtil.java:206)
> {code}
> Notice, we think its a case of processFailover so we think we can just scan 
> meta to fixup our inmemory picture of the running cluster, only the scan of 
> meta fails because the meta isn not assigned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-3047) If new master crashes, restart is messy

Reply via email to