[
https://issues.apache.org/jira/browse/HBASE-14777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010336#comment-15010336
]
Ashu Pachauri commented on HBASE-14777:
---------------------------------------
[~appy]
{quote}
why are we returning index and iterating reverse and all. Can't we simply
remove first element every time given the fact we are going in order and
blocking on futures. Such a simple logic would have avoided the bug in first
place.
{quote}
This is a perfectly valid question, and I also had the same doubt, we can
totally make it work without the use of ordinals. But, removing the first
element always won't work. If you look at the code more closely:
{code}
for (Future<Integer> f : futures) {
try {
// wait for all futures, remove successful parts
// (only the remaining parts will be retried)
entryLists.remove(f.get());
} catch (InterruptedException ie) {
iox = new IOException(ie);
} catch (ExecutionException ee) {
// cause must be an IOException
iox = (IOException)ee.getCause();
}
}
{code}
Suppose, we always remove the first element. If an
InterruptedException/ExecutionException is thrown at index 0, we will not
remove the element. Now, we don't want to retry, we want to check other Futures
before we retry. So, we move to index 1, check the future and remove an entry
if it succeeds. Now, which element would you remove, index 0 or index 1? To
make this work, at any point in the loop, you will have to keep track of how
many futures failed before this future in the list (the index to remove is
offset by that number from index 0). Now, this is perfectly simple to do, just
keep a running counter of successful futures. But, what is already being done
here is also perfectly simple. Also, having ordinals gives you other guarantees
that you are referring to the correct Entry in the list, if code became more
complicated in the future.
{quote}
Why change utility cluster from 2 to 4? If it is a non-trivial reason, please
add a comment for the same.
{quote}
Yes, because InterClusterReplicationEndpoint uses number of sinks as a
component in deciding the batches for shipping. I thought having at most two
batches was not much of a test. I will add a comment for the same.
> Replication fails with IndexOutOfBoundsException
> ------------------------------------------------
>
> Key: HBASE-14777
> URL: https://issues.apache.org/jira/browse/HBASE-14777
> Project: HBase
> Issue Type: Bug
> Components: Replication
> Reporter: Bhupendra Kumar Jain
> Assignee: Bhupendra Kumar Jain
> Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14777-1.patch, HBASE-14777-2.patch,
> HBASE-14777.patch
>
>
> Replication fails with IndexOutOfBoundsException
> {code}
> regionserver.ReplicationSource$ReplicationSourceWorkerThread(939):
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint
> threw unknown exception:java.lang.IndexOutOfBoundsException: Index: 1, Size:
> 1
> at java.util.ArrayList.rangeCheck(Unknown Source)
> at java.util.ArrayList.remove(Unknown Source)
> at
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.replicate(HBaseInterClusterReplicationEndpoint.java:222)
> {code}
> Its happening due to incorrect removal of entries from the replication
> entries list.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)