[jira] [Commented] (HBASE-14777) Replication fails with IndexOutOfBoundsException

Ashu Pachauri (JIRA) Tue, 17 Nov 2015 22:26:16 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-14777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010336#comment-15010336
 ]


Ashu Pachauri commented on HBASE-14777:
---------------------------------------

[~appy] 
{quote}
why are we returning index and iterating reverse and all. Can't we simply 
remove first element every time given the fact we are going in order and 
blocking on futures. Such a simple logic would have avoided the bug in first 
place.
{quote}
This is a perfectly valid question, and I also had the same doubt, we can 
totally make it work without the use of ordinals. But, removing the first 
element always won't work. If you look at the code more closely:
{code}
for (Future<Integer> f : futures) {
          try {
            // wait for all futures, remove successful parts
            // (only the remaining parts will be retried)
            entryLists.remove(f.get());
          } catch (InterruptedException ie) {
            iox =  new IOException(ie);
          } catch (ExecutionException ee) {
            // cause must be an IOException
            iox = (IOException)ee.getCause();
          }
        }
{code}
Suppose, we always remove the first element. If an 
InterruptedException/ExecutionException is thrown at index 0, we will not 
remove the element. Now, we don't want to retry, we want to check other Futures 
before we retry. So, we move to index 1, check the future and remove an entry 
if it succeeds. Now, which element would you remove, index 0 or index 1? To 
make this work, at any point in the loop, you will have to keep track of how 
many futures failed before this future in the list (the index to remove is 
offset by that number from index 0). Now, this is perfectly simple to do, just 
keep a running counter of successful futures. But, what is already being done 
here is also perfectly simple. Also, having ordinals gives you other guarantees 
that you are referring to the correct Entry in the list, if code became more 
complicated in the future.

{quote}
Why change utility cluster from 2 to 4? If it is a non-trivial reason, please 
add a comment for the same.
{quote}
Yes, because InterClusterReplicationEndpoint uses number of sinks as a 
component in deciding the batches for shipping. I thought having at most two 
batches was not much of a test. I will add a comment for the same.

> Replication fails with IndexOutOfBoundsException
> ------------------------------------------------
>
>                 Key: HBASE-14777
>                 URL: https://issues.apache.org/jira/browse/HBASE-14777
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>            Reporter: Bhupendra Kumar Jain
>            Assignee: Bhupendra Kumar Jain
>            Priority: Critical
>             Fix For: 2.0.0, 1.2.0, 1.3.0
>
>         Attachments: HBASE-14777-1.patch, HBASE-14777-2.patch, 
> HBASE-14777.patch
>
>
> Replication fails with IndexOutOfBoundsException 
> {code}
> regionserver.ReplicationSource$ReplicationSourceWorkerThread(939): 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint
>  threw unknown exception:java.lang.IndexOutOfBoundsException: Index: 1, Size: 
> 1
>       at java.util.ArrayList.rangeCheck(Unknown Source)
>       at java.util.ArrayList.remove(Unknown Source)
>       at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.replicate(HBaseInterClusterReplicationEndpoint.java:222)
> {code}
> Its happening due to incorrect removal of entries from the replication 
> entries list. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14777) Replication fails with IndexOutOfBoundsException

Reply via email to