[
https://issues.apache.org/jira/browse/HBASE-15974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15330774#comment-15330774
]
Joseph commented on HBASE-15974:
--------------------------------
Hello Vincent,
Yes, I think that this will add an extra dependency on the RS's hosting the
Replication Table. During ReplicationSourceManager.init(), when we try to claim
orphaned queues, ReplicationQueuesHBaseImpl will try to run a Scanner over the
entire Replication Table to locate the orphaned queues. If a RS hosting a
region of the Replication Table happens to be down during this time the Scanner
operation will fail. We try to combat this situation by setting an extremely
long retry period for Replication Table operations: HBASE-15937. As of now
operations on the Replication Table have a retry time of 2 hours. If the region
is still unavailable after this time, the RS making the request will abort. We
are hoping that cluster startup does not take longer than 2 hours. The patch at
HBASE-14190 should also help make sure that the Replication Table's
initialization is prioritized during cluster startup.
> Create a ReplicationQueuesClientHBaseImpl
> -----------------------------------------
>
> Key: HBASE-15974
> URL: https://issues.apache.org/jira/browse/HBASE-15974
> Project: HBase
> Issue Type: Sub-task
> Components: Replication
> Reporter: Joseph
> Assignee: Joseph
> Attachments: HBASE-15974.patch
>
>
> Currently ReplicationQueuesClient utilizes a ZooKeeper implementation
> ReplicationQueuesClientZkImpl that attempts to read from the ZNode where
> ReplicationQueuesZkImpl tracked WAL's. So we need to create a HBase
> implementation for ReplicationQueuesClient.
> The review is posted at https://reviews.apache.org/r/48521/
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)