[
https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773541#comment-13773541
]
Hudson commented on HBASE-7634:
-------------------------------
SUCCESS: Integrated in HBase-TRUNK #4541 (See
[https://builds.apache.org/job/HBase-TRUNK/4541/])
HBASE-9594 Add reference documentation on changes made by HBASE-7634
(Replication handling of peer cluster changes) (stack: rev 1525110)
* /hbase/trunk/src/main/site/xdoc/replication.xml
> Replication handling of changes to peer clusters is inefficient
> ---------------------------------------------------------------
>
> Key: HBASE-7634
> URL: https://issues.apache.org/jira/browse/HBASE-7634
> Project: HBase
> Issue Type: Bug
> Components: Replication
> Affects Versions: 0.95.2
> Reporter: Gabriel Reid
> Assignee: Gabriel Reid
> Fix For: 0.98.0, 0.95.2
>
> Attachments: HBASE-7634.patch, HBASE-7634.v2.patch,
> HBASE-7634.v3.patch, HBASE-7634.v4.patch, HBASE-7634.v5.patch,
> HBASE-7634.v6.patch
>
>
> The current handling of changes to the region servers in a replication peer
> cluster is currently quite inefficient. The list of region servers that are
> being replicated to is only updated if there are a large number of issues
> encountered while replicating.
> This can cause it to take quite a while to recognize that a number of the
> regionserver in a peer cluster are no longer available. A potentially bigger
> problem is that if a replication peer cluster is started with a small number
> of regionservers, and then more region servers are added after replication
> has started, the additional region servers will never be used for replication
> (unless there are failures on the in-use regionservers).
> Part of the current issue is that the retry code in
> ReplicationSource#shipEdits checks a randomly-chosen replication peer
> regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a
> replication write has failed on a different randonly-chosen replication peer.
> If the peer is seen as not down, another randomly-chosen peer is used for
> writing.
> A second part of the issue is that changes to the list of region servers in a
> peer cluster are not detected at all, and are only picked up if a certain
> number of failures have occurred when trying to ship edits.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira