[
https://issues.apache.org/jira/browse/HBASE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116016#comment-13116016
]
Jean-Daniel Cryans commented on HBASE-3130:
-------------------------------------------
Also I'd add that I was able to test the patch (on 0.90) and it really works,
proof:
First it loses the connection:
{quote}
2011-09-27 16:44:54,984 WARN
org.apache.hadoop.hbase.replication.ReplicationPeer: connection to cluster:
10.10.30.7:2181:/hbase1-0x132ad0f29d70017 connection to cluster:
10.10.30.7:2181:/hbase1-0x132ad0f29d70017 received expired from ZooKeeper,
aborting
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode =
Session expired
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:343)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:261)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
2011-09-27 16:44:54,984 INFO org.apache.zookeeper.ClientCnxn: EventThread shut
down
{quote}
Then later when it tries to replicate it tries to talk to ZK again and it works
after a reload:
{quote}
2011-09-27 16:49:03,738 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Since we
are unable to replicate, sleeping 1000 times 10
2011-09-27 16:49:13,738 WARN
org.apache.hadoop.hbase.replication.ReplicationZookeeper: Lost the ZooKeeper
connection for peer 1
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode =
Session expired for /hbase1/rs
at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243)
at
org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenNoWatch(ZKUtil.java:389)
at
org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndGetAsAddresses(ZKUtil.java:355)
at
org.apache.hadoop.hbase.replication.ReplicationZookeeper.fetchSlavesAddresses(ReplicationZookeeper.java:268)
at
org.apache.hadoop.hbase.replication.ReplicationZookeeper.getSlavesAddresses(ReplicationZookeeper.java:239)
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.chooseSinks(ReplicationSource.java:205)
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:588)
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:341)
2011-09-27 16:49:13,772 INFO org.apache.zookeeper.ZooKeeper: Initiating client
connection, connectString=10.10.30.7:2181 sessionTimeout=20000
watcher=connection to cluster: 10.10.30.7:2181:/hbase1
2011-09-27 16:49:13,773 INFO org.apache.zookeeper.ClientCnxn: Opening socket
connection to server /10.10.30.7:2181
2011-09-27 16:49:14,111 INFO org.apache.zookeeper.ClientCnxn: Socket connection
established to hbasedev.sfo.stumble.net/10.10.30.7:2181, initiating session
2011-09-27 16:49:14,140 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server hbasedev.sfo.stumble.net/10.10.30.7:2181,
sessionid = 0x132ad0f29d70024, negotiated timeout = 20000
{quote}
> [replication] ReplicationSource can't recover from session expired on remote
> clusters
> -------------------------------------------------------------------------------------
>
> Key: HBASE-3130
> URL: https://issues.apache.org/jira/browse/HBASE-3130
> Project: HBase
> Issue Type: Bug
> Components: replication
> Affects Versions: 0.92.0
> Reporter: Jean-Daniel Cryans
> Assignee: Chris Trezzo
> Fix For: 0.92.0
>
> Attachments: 3130-v2.txt, 3130-v3.txt, 3130.txt
>
>
> Currently ReplicationSource cannot recover when its zookeeper connection to
> its remote cluster expires. HLogs are still being tracked, but a cluster
> restart is required to continue replication (or a rolling restart).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira