[
https://issues.apache.org/jira/browse/HBASE-14811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004785#comment-15004785
]
Ashu Pachauri commented on HBASE-14811:
---------------------------------------
[~eclark] On second thoughts, I see that it is not just the retry logic that is
broken, but this will cause the HBaseInterClusterReplicationEndpoint.replicate
to fail every single time because the IndexOutOfBoundsException is not handled
in there. The ReplicationSource keeps sending the same batch again and again,
and the replication is completely stuck. This might make it a blocker, rather
than just critical.
[~mbertozzi] Looks like it is. Thanks for pointing it out.
> HBaseInterClusterReplicationEndpoint retry logic is broken
> ----------------------------------------------------------
>
> Key: HBASE-14811
> URL: https://issues.apache.org/jira/browse/HBASE-14811
> Project: HBase
> Issue Type: Bug
> Components: Replication
> Affects Versions: 2.0.0, 1.0.2, 1.2.0, 1.2.1, 0.98.16
> Reporter: Ashu Pachauri
> Assignee: Ashu Pachauri
> Priority: Critical
>
> In HBaseInterClusterReplicationEndpoint, we do something like this:
> {code}
> entryLists.remove(f.get());
> {code}
> where f.get() returns an ordinal number which represents the index of the
> element in the entryLists that just succeeded replicating. We remove these
> entries because we want to retry with remaining elements in the list in case
> of a failure. Since entryLists is an ArrayList, the subsequent elements are
> shifted left in case we remove an element. This breaks the intended
> functionality. The fix is to reverse sort the ordinals and then perform the
> deletion in one go.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)