[
https://issues.apache.org/jira/browse/HBASE-7458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587592#comment-13587592
]
Jeffrey Zhong commented on HBASE-7458:
--------------------------------------
I checked the error log(only one left by now) and found the root cause is due
to the test case itself run too slow. Below are related log lines:
...
2013-02-16 19:38:12,188 INFO [Thread-652]
replication.TestReplicationQueueFailover(61): Start loading table
...
2013-02-16 19:43:05,458 DEBUG [Thread-652] client.ClientScanner(90): Creating
scanner over test starting at key ''
2013-02-16 19:43:05,458 DEBUG [Thread-652] client.ClientScanner(198): Advancing
internal scanner to startKey at '
...
2013-02-16 19:43:07,101 DEBUG
[RegionServer:1;janus.apache.org,42433,1361043425223.replicationSource,2]
regionserver.ReplicationSource(640): Replicating 1
2013-02-16 19:43:07,474 DEBUG
[RegionServer:1;janus.apache.org,42433,1361043425223.replicationSource,2]
regionserver.ReplicationSource(577): Nothing to replicate, sleeping 100 times 1
...
2013-02-16 19:43:08,087 INFO [Thread-652]
replication.TestReplicationQueueFailover(108): Only got 17504 rows instead of
17576 current i=-71
...
2013-02-16 19:43:10,087 DEBUG [Thread-652] client.ClientScanner(90): Creating
scanner over test starting at key ''
2013-02-16 19:43:10,087 DEBUG [Thread-652] client.ClientScanner(198): Advancing
internal scanner to startKey at ''
...
2013-02-16 19:43:12,192 INFO [pool-1-thread-1] hbase.HBaseTestingUtility(703):
Shutting down minicluster
...
As you can see from the above, the test case started at 19:38:12 and completed
at 19:43:12, exact 5 mins.
The second to the last scan happened at 19:43:05 and completed at 19:43:08
while replication source reading completed at 19:43:07 "Nothing to replicate"
so we had message "Only got 17504 rows instead of 17576" because the
replication was on going when the scan started.
The last scan happened at 19:43:10 and before it could complete, cluster was
shutting down at 2013-02-16 19:43:12. Therefore, the test case itself just run
too slow in the build box. I came up a fix which logically increase replication
batch size in order to faster the test case and finish in time even on a slow
machine.
Thanks,
-Jeffrey
> TestReplicationWithCompression fails intermittently in both PreCommit and
> trunk builds
> --------------------------------------------------------------------------------------
>
> Key: HBASE-7458
> URL: https://issues.apache.org/jira/browse/HBASE-7458
> Project: HBase
> Issue Type: Bug
> Reporter: Ted Yu
> Assignee: Jeffrey Zhong
> Priority: Critical
> Labels: tes
> Fix For: 0.95.0
>
> Attachments: ErrorLog.rtf, hbase-7458.patch
>
>
> TestReplicationWithCompression has been failing often.
> Here are few examples:
> https://builds.apache.org/job/PreCommit-HBASE-Build/3755/testReport/
> https://builds.apache.org/job/HBase-TRUNK/3672/testReport/org.apache.hadoop.hbase.replication/TestReplicationWithCompression/testDeleteTypes/
> https://builds.apache.org/job/HBase-0.94/677/testReport/junit/org.apache.hadoop.hbase.replication/TestReplicationWithCompression/queueFailover/
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira