[
https://issues.apache.org/jira/browse/HBASE-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13894153#comment-13894153
]
Hadoop QA commented on HBASE-10482:
-----------------------------------
{color:green}+1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12627520/HBASE-10249.patch
against trunk revision .
ATTACHMENT ID: 12627520
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:green}+1 tests included{color}. The patch appears to include 3 new
or modified tests.
{color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop
1.0 profile.
{color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop
1.1 profile.
{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 1.3.9) warnings.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:green}+1 lineLengths{color}. The patch does not introduce lines
longer than 100
{color:green}+1 site{color}. The mvn site goal succeeds with this patch.
{color:green}+1 core tests{color}. The patch passed unit tests in .
Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/8620//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/8620//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/8620//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/8620//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/8620//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/8620//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/8620//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/8620//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/8620//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/8620//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/8620//console
This message is automatically generated.
> ReplicationSyncUp doesn't clean up its ZK, needed for tests
> -----------------------------------------------------------
>
> Key: HBASE-10482
> URL: https://issues.apache.org/jira/browse/HBASE-10482
> Project: HBase
> Issue Type: Bug
> Components: Replication
> Affects Versions: 0.96.1, 0.94.16
> Reporter: Jean-Daniel Cryans
> Assignee: Jean-Daniel Cryans
> Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.17
>
> Attachments: HBASE-10249.patch
>
>
> TestReplicationSyncUpTool failed again:
> https://builds.apache.org/job/HBase-TRUNK/4895/testReport/junit/org.apache.hadoop.hbase.replication/TestReplicationSyncUpTool/testSyncUpTool/
> It's not super obvious why only one of the two tables is replicated, the test
> could use some more logging, but I understand it this way:
> The first ReplicationSyncUp gets started and for some reason it cannot
> replicate the data:
> {noformat}
> 2014-02-06 21:32:19,811 INFO [Thread-1372]
> regionserver.ReplicationSourceManager(203): Current list of replicators:
> [1391722339091.SyncUpTool.replication.org,1234,1,
> quirinus.apache.org,37045,1391722237951,
> quirinus.apache.org,33502,1391722238125] other RSs: []
> 2014-02-06 21:32:19,811 INFO [Thread-1372.replicationSource,1]
> regionserver.ReplicationSource(231): Replicating
> db42e7fc-7f29-4038-9292-d85ea8b9994b -> 783c0ab2-4ff9-4dc0-bb38-86bf31d1d817
> 2014-02-06 21:32:19,892 TRACE [Thread-1372.replicationSource,2]
> regionserver.ReplicationSource(596): No log to process, sleeping 100 times 1
> 2014-02-06 21:32:19,911 TRACE [Thread-1372.replicationSource,1]
> regionserver.ReplicationSource(596): No log to process, sleeping 100 times 1
> 2014-02-06 21:32:20,094 TRACE [Thread-1372.replicationSource,2]
> regionserver.ReplicationSource(596): No log to process, sleeping 100 times 2
> ...
> 2014-02-06 21:32:23,414 TRACE [Thread-1372.replicationSource,1]
> regionserver.ReplicationSource(596): No log to process, sleeping 100 times 8
> 2014-02-06 21:32:23,673 INFO [ReplicationExecutor-0]
> replication.ReplicationQueuesZKImpl(169): Moving
> quirinus.apache.org,37045,1391722237951's hlogs to my queue
> 2014-02-06 21:32:23,768 DEBUG [ReplicationExecutor-0]
> replication.ReplicationQueuesZKImpl(396): Creating
> quirinus.apache.org%2C37045%2C1391722237951.1391722243779 with data 10803
> 2014-02-06 21:32:23,842 DEBUG [ReplicationExecutor-0]
> replication.ReplicationQueuesZKImpl(396): Creating
> quirinus.apache.org%2C37045%2C1391722237951.1391722243779 with data 10803
> 2014-02-06 21:32:24,297 TRACE [Thread-1372.replicationSource,2]
> regionserver.ReplicationSource(596): No log to process, sleeping 100 times 9
> 2014-02-06 21:32:24,314 TRACE [Thread-1372.replicationSource,1]
> regionserver.ReplicationSource(596): No log to process, sleeping 100 times 9
> {noformat}
> Finally it gives up:
> {noformat}
> 2014-02-06 21:32:30,873 DEBUG [Thread-1372]
> replication.TestReplicationSyncUpTool(323): SyncUpAfterDelete failed at retry
> = 0, with rowCount_ht1TargetPeer1 =100 and rowCount_ht2TargetAtPeer1 =200
> {noformat}
> The syncUp tool has an ID you can follow, grep for
> syncupReplication1391722338885 or just the timestamp, and you can see it
> doing things after that. The reason is that the tool closes the
> ReplicationSourceManager but not the ZK connection, so events _still_ come in
> and NodeFailoverWorker _still_ tries to recover queues but then there's
> nothing to process them.
> Later in the logs you can see:
> {noformat}
> 2014-02-06 21:32:37,381 INFO [ReplicationExecutor-0]
> replication.ReplicationQueuesZKImpl(169): Moving
> quirinus.apache.org,33502,1391722238125's hlogs to my queue
> 2014-02-06 21:32:37,567 INFO [ReplicationExecutor-0]
> replication.ReplicationQueuesZKImpl(239): Won't transfer the queue, another
> RS took care of it because of: KeeperErrorCode = NoNode for
> /1/replication/rs/quirinus.apache.org,33502,1391722238125/lock
> {noformat}
> There shouldn't' be any racing, but now someone already moved
> "quirinus.apache.org,33502,1391722238125" away.
> FWIW I can't even make the test fail on my machine so I'm not 100% sure
> closing the ZK connection fixes the issue, but at least it's the right thing
> to do.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)