[ 
https://issues.apache.org/jira/browse/HBASE-7458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587592#comment-13587592
 ] 

Jeffrey Zhong commented on HBASE-7458:
--------------------------------------

I checked the error log(only one left by now) and found the root cause is due 
to the test case itself run too slow. Below are related log lines:

...
2013-02-16 19:38:12,188 INFO  [Thread-652] 
replication.TestReplicationQueueFailover(61): Start loading table
...
2013-02-16 19:43:05,458 DEBUG [Thread-652] client.ClientScanner(90): Creating 
scanner over test starting at key ''
2013-02-16 19:43:05,458 DEBUG [Thread-652] client.ClientScanner(198): Advancing 
internal scanner to startKey at '
...
2013-02-16 19:43:07,101 DEBUG 
[RegionServer:1;janus.apache.org,42433,1361043425223.replicationSource,2] 
regionserver.ReplicationSource(640): Replicating 1
2013-02-16 19:43:07,474 DEBUG 
[RegionServer:1;janus.apache.org,42433,1361043425223.replicationSource,2] 
regionserver.ReplicationSource(577): Nothing to replicate, sleeping 100 times 1
...
2013-02-16 19:43:08,087 INFO  [Thread-652] 
replication.TestReplicationQueueFailover(108): Only got 17504 rows instead of 
17576 current i=-71
...
2013-02-16 19:43:10,087 DEBUG [Thread-652] client.ClientScanner(90): Creating 
scanner over test starting at key ''
2013-02-16 19:43:10,087 DEBUG [Thread-652] client.ClientScanner(198): Advancing 
internal scanner to startKey at ''
...
2013-02-16 19:43:12,192 INFO  [pool-1-thread-1] hbase.HBaseTestingUtility(703): 
Shutting down minicluster
...

As you can see from the above, the test case started at 19:38:12 and completed 
at 19:43:12, exact 5 mins. 
The second to the last scan happened at 19:43:05 and completed at 19:43:08 
while replication source reading completed at 19:43:07 "Nothing to replicate" 
so we had message "Only got 17504 rows instead of 17576" because the 
replication was on going when the scan started.

The last scan happened at 19:43:10 and before it could complete, cluster was 
shutting down at 2013-02-16 19:43:12. Therefore, the test case itself just run 
too slow in the build box. I came up a fix which logically increase replication 
batch size in order to faster the test case and finish in time even on a slow 
machine.

Thanks,
-Jeffrey
 




                
> TestReplicationWithCompression fails intermittently in both PreCommit and 
> trunk builds
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-7458
>                 URL: https://issues.apache.org/jira/browse/HBASE-7458
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Jeffrey Zhong
>            Priority: Critical
>              Labels: tes
>             Fix For: 0.95.0
>
>         Attachments: ErrorLog.rtf, hbase-7458.patch
>
>
> TestReplicationWithCompression has been failing often.
> Here are few examples:
> https://builds.apache.org/job/PreCommit-HBASE-Build/3755/testReport/
> https://builds.apache.org/job/HBase-TRUNK/3672/testReport/org.apache.hadoop.hbase.replication/TestReplicationWithCompression/testDeleteTypes/
> https://builds.apache.org/job/HBase-0.94/677/testReport/junit/org.apache.hadoop.hbase.replication/TestReplicationWithCompression/queueFailover/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to