[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

Jean-Daniel Cryans (JIRA) Wed, 19 Sep 2012 14:47:09 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459137#comment-13459137
 ]


Jean-Daniel Cryans commented on HBASE-6649:
-------------------------------------------

The server that has the patch did a "Break on IOE" twice, and it seems to work:

{noformat}
2012-09-19 21:26:50,104 DEBUG 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening log 
for replication va1r6s44%2C10304%2C1348088378534.1348089931722 at 21992487
2012-09-19 21:26:50,110 DEBUG 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Break on 
IOE: 
hdfs://va1r5s41:10101/va1-backup/.logs/va1r6s44,10304,1348088378534/va1r6s44%2C10304%2C1348088378534.1348089931722,
 entryStart=21993911, pos=22058496, end=22058496, edit=5
2012-09-19 21:26:50,110 DEBUG 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: 
currentNbOperations:783007 and seenEntries:5 and size: 64585
2012-09-19 21:26:50,110 DEBUG 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Replicating 
5
2012-09-19 21:26:50,119 INFO 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: 
Going to report log #va1r6s44%2C10304%2C1348088378534.1348089931722 for 
position 21993911 in 
hdfs://va1r5s41:10101/va1-backup/.logs/va1r6s44,10304,1348088378534/va1r6s44%2C10304%2C1348088378534.1348089931722
2012-09-19 21:26:50,129 INFO 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: 
Removing 0 logs in the list: []
2012-09-19 21:26:50,129 DEBUG 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Replicated 
in total: 145502
2012-09-19 21:26:50,129 DEBUG 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening log 
for replication va1r6s44%2C10304%2C1348088378534.1348089931722 at 21993911
{noformat}

One thing that I saw that this patch breaks is the size in 
"currentNbOperations:783007 and seenEntries:5 and size: 64585" because it 
relies on this.position being the position at the beginning. I often see that 
number at 0 while having edits to replicate. It's minor since in HBASE-6804 I'm 
removing that log message altogether but we may want to either remove the size 
or keep track of what it is at the beginning of the loop within the context of 
this jira.
                
> [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-6649
>                 URL: https://issues.apache.org/jira/browse/HBASE-6649
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>             Fix For: 0.96.0, 0.92.3, 0.94.2
>
>         Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 
> 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 
> 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 
> #502 test - queueFailover [Jenkins].html
>
>
> Have seen it twice in the recent past: http://bit.ly/MPCykB & 
> http://bit.ly/O79Dq7 .. 
> Looking briefly at the logs hints at a pattern - in both the failed test 
> instances, there was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]

Reply via email to