[
https://issues.apache.org/jira/browse/HBASE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105578#comment-13105578
]
Jean-Daniel Cryans commented on HBASE-3130:
-------------------------------------------
@Chris
Sorry for the late answer.
bq. 1. I am now specifically catching the SESSIONEXPIRED KeeperExceptions that
are thrown by methods which use a ZookeeperWatcher from ReplicationPeer.
They are all catch at a super low lovel (ZKW) so I don't think you need that.
bq. 2. I am adding a public method to ReplicationZookeeper that is responsible
for retrying/opening a new connection to the peer zookeeper cluster.
There might be some synchronization problems with ReplicationSource, watch out.
bq. 3. I will modify ReplicationSource so that it implements the Abortable
interface.
I'm -0, I'd prefer if all the handling was kept in ReplicationZookeeper.
Ideally for ReplicationSource it would just see that it can't reach the peer
for some reason and retry.
bq. 4. Currently I am going back and forth about two retry strategies:
Definitely 4B, that's how the code works at the moment. Wait until the peer
comes back, the operator can always turn it off.
> [replication] ReplicationSource can't recover from session expired on remote
> clusters
> -------------------------------------------------------------------------------------
>
> Key: HBASE-3130
> URL: https://issues.apache.org/jira/browse/HBASE-3130
> Project: HBase
> Issue Type: Bug
> Components: replication
> Reporter: Jean-Daniel Cryans
>
> Currently ReplicationSource cannot recover when its zookeeper connection to
> its remote cluster expires. HLogs are still being tracked, but a cluster
> restart is required to continue replication (or a rolling restart).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira