[ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13458274#comment-13458274
 ] 

Devaraj Das commented on HBASE-6649:
------------------------------------

Okay a plausible explanation - 
1. ReplicationSource.readAllEntriesToReplicateOrNextFile throws an IOException 
(which causes the log "Break on IOE:" to print), but ignores the exception.
2. When readAllEntriesToReplicateOrNextFile returns, the reader's file-pointer 
position is queried and 'this.position' is set to that (the reader's 
file-pointer is possibly pointing to gibberish)
3. Eventually, readAllEntriesToReplicateOrNextFile gets called again, and this 
time this.reader.next inside throws IndexOutOfBounds exception because it read 
gibberish (looking at the code of DataInputStream.java, it seems like one of 
the cases where the IndexOutOfBounds is thrown is when the length passed to 
readFully is less than 0).

The fix I can think of is to reset the reader's 'position' to the last valid 
position (upon return from the method readAllEntriesToReplicateOrNextFile).

Thoughts on the above? Does the analysis make sense?
                
> [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-6649
>                 URL: https://issues.apache.org/jira/browse/HBASE-6649
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>             Fix For: 0.96.0, 0.92.3, 0.94.2
>
>         Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
> 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - 
> queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover 
> [Jenkins].html
>
>
> Have seen it twice in the recent past: http://bit.ly/MPCykB & 
> http://bit.ly/O79Dq7 .. 
> Looking briefly at the logs hints at a pattern - in both the failed test 
> instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to