[
https://issues.apache.org/jira/browse/HBASE-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack resolved HBASE-1736.
--------------------------
Resolution: Invalid
All is different now, 5 years later.
> If RS can't talk to master, pause; more importantly, don't split (Currently
> we do and splits are lost and table is wounded)
> ---------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-1736
> URL: https://issues.apache.org/jira/browse/HBASE-1736
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: stack
> Priority: Critical
>
> What I saw was master shutting itself down because it had lost zk lease.
> Fine. The RS though doesn't look like it can deal with this situation.
> We'll see stuff like this:
> {code}
> ...failed on connection exception: java.net.ConnectException: Connection
> refused
> at
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:744)
> at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:722)
> at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:328)
> at $Proxy0.regionServerReport(Unknown Source)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:470)
> at java.lang.Thread.run(Unknown Source)
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
> at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
> at
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:305)
> at
> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:826)
> at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:707)
> ... 4 more
> {code}
> ... all over the regionserver as it tries to send heartbeat to master on this
> broken connection.
> On split, we close parent, add children to the catalog but then when we try
> to tell the master about the split, it fails. Means the children never get
> deployed. Meantime the parent is offline.
> This issue is about going through the regionserver and anytime it has a
> connection to master, make sure on fault that no damage is done the table and
> then that the regionserver puts a pause on splitting.
--
This message was sent by Atlassian JIRA
(v6.2#6252)