[ 
https://issues.apache.org/jira/browse/HBASE-15974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15330774#comment-15330774
 ] 

Joseph commented on HBASE-15974:
--------------------------------

Hello Vincent,

Yes, I think that this will add an extra dependency on the RS's hosting the 
Replication Table. During ReplicationSourceManager.init(), when we try to claim 
orphaned queues, ReplicationQueuesHBaseImpl will try to run a Scanner over the 
entire Replication Table to locate the orphaned queues. If a RS hosting a 
region of the Replication Table happens to be down during this time the Scanner 
operation will fail. We try to combat this situation by setting an extremely 
long retry period for Replication Table operations: HBASE-15937. As of now 
operations on the Replication Table have a retry time of 2 hours. If the region 
is still unavailable after this time, the RS making the request will abort. We 
are hoping that cluster startup does not take longer than 2 hours. The patch at 
HBASE-14190 should also help make sure that the Replication Table's 
initialization is prioritized during cluster startup.

> Create a ReplicationQueuesClientHBaseImpl
> -----------------------------------------
>
>                 Key: HBASE-15974
>                 URL: https://issues.apache.org/jira/browse/HBASE-15974
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Replication
>            Reporter: Joseph
>            Assignee: Joseph
>         Attachments: HBASE-15974.patch
>
>
> Currently ReplicationQueuesClient utilizes a ZooKeeper implementation 
> ReplicationQueuesClientZkImpl that attempts to read from the ZNode where 
> ReplicationQueuesZkImpl tracked WAL's. So we need to create a HBase 
> implementation for ReplicationQueuesClient.
> The review is posted at https://reviews.apache.org/r/48521/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to