[ 
https://issues.apache.org/jira/browse/HBASE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105578#comment-13105578
 ] 

Jean-Daniel Cryans commented on HBASE-3130:
-------------------------------------------

@Chris

Sorry for the late answer.

bq. 1. I am now specifically catching the SESSIONEXPIRED KeeperExceptions that 
are thrown by methods which use a ZookeeperWatcher from ReplicationPeer.

They are all catch at a super low lovel (ZKW) so I don't think you need that.

bq. 2. I am adding a public method to ReplicationZookeeper that is responsible 
for retrying/opening a new connection to the peer zookeeper cluster.

There might be some synchronization problems with ReplicationSource, watch out.

bq. 3. I will modify ReplicationSource so that it implements the Abortable 
interface.

I'm -0, I'd prefer if all the handling was kept in ReplicationZookeeper. 
Ideally for ReplicationSource it would just see that it can't reach the peer 
for some reason and retry.

bq. 4. Currently I am going back and forth about two retry strategies:

Definitely 4B, that's how the code works at the moment. Wait until the peer 
comes back, the operator can always turn it off.

> [replication] ReplicationSource can't recover from session expired on remote 
> clusters
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-3130
>                 URL: https://issues.apache.org/jira/browse/HBASE-3130
>             Project: HBase
>          Issue Type: Bug
>          Components: replication
>            Reporter: Jean-Daniel Cryans
>
> Currently ReplicationSource cannot recover when its zookeeper connection to 
> its remote cluster expires. HLogs are still being tracked, but a cluster 
> restart is required to continue replication (or a rolling restart).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to