[ 
https://issues.apache.org/jira/browse/HBASE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102083#comment-13102083
 ] 

Jean-Daniel Cryans commented on HBASE-3130:
-------------------------------------------

Here it goes:

{quote}
2011-09-09 19:44:28,224 FATAL 
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
serverName=sv4r17s40,60020,1313587209632, load=(requests=4292, regions=186, 
usedHeap=11929, maxHeap=24749): connection to cluster: 5-0x130d4937f890066 
connection to cluster: 5-0x130d4937f890066 received expired from ZooKeeper, 
aborting
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired
        at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:343)
        at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:261)
        at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)

{quote}

As you can see it's pretty generic, I could trace it was the peer connection 
with the "connection to cluster". Moreover the fix will take place around 
ReplicationPeer which contains a ZKW that requires an Abortable which, at the 
moment, is the RS itself. Instead we should pass our own, or maybe 
ReplicationSource should implement it.

> [replication] ReplicationSource can't recover from session expired on remote 
> clusters
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-3130
>                 URL: https://issues.apache.org/jira/browse/HBASE-3130
>             Project: HBase
>          Issue Type: Bug
>          Components: replication
>            Reporter: Jean-Daniel Cryans
>
> Currently ReplicationSource cannot recover when its zookeeper connection to 
> its remote cluster expires. HLogs are still being tracked, but a cluster 
> restart is required to continue replication (or a rolling restart).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to