[jira] [Resolved] (HBASE-1736) If RS can't talk to master, pause; more importantly, don't split (Currently we do and splits are lost and table is wounded)

stack (JIRA) Wed, 16 Jul 2014 11:54:47 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


stack resolved HBASE-1736.
--------------------------

    Resolution: Invalid

All is different now, 5 years later.

> If RS can't talk to master, pause; more importantly, don't split (Currently 
> we do and splits are lost and table is wounded)
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1736
>                 URL: https://issues.apache.org/jira/browse/HBASE-1736
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Critical
>
> What I saw was master shutting itself down because it had lost zk lease.  
> Fine.   The RS though doesn't look like it can deal with this situation.    
> We'll see stuff like this:
> {code}
> ...failed on connection exception: java.net.ConnectException: Connection 
> refused
>     at 
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:744)
>     at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:722)
>     at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:328)
>     at $Proxy0.regionServerReport(Unknown Source)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:470)
>     at java.lang.Thread.run(Unknown Source)
> Caused by: java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
>     at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>     at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
>     at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:305)
>     at 
> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:826)
>     at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:707)
>     ... 4 more
> {code}
> ... all over the regionserver as it tries to send heartbeat to master on this 
> broken connection.
> On split, we close parent, add children to the catalog but then when we try 
> to tell the master about the split, it fails.  Means the children never get 
> deployed.  Meantime  the parent is offline.
> This issue is about going through the regionserver and anytime it has a 
> connection to master, make sure on fault that no damage is done the table and 
> then that the regionserver puts a pause on splitting.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HBASE-1736) If RS can't talk to master, pause; more importantly, don't split (Currently we do and splits are lost and table is wounded)

Reply via email to