[
https://issues.apache.org/jira/browse/HBASE-12865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14608362#comment-14608362
]
Hadoop QA commented on HBASE-12865:
-----------------------------------
{color:green}+1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12742805/HBASE-12865-V1.diff
against master branch at commit f8bd578b80b4e656d799c82ca1b6191e35bb0ae4.
ATTACHMENT ID: 12742805
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:green}+1 tests included{color}. The patch appears to include 6 new
or modified tests.
{color:green}+1 hadoop versions{color}. The patch compiles with all
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 protoc{color}. The applied patch does not increase the
total number of protoc compiler warnings.
{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.
{color:green}+1 checkstyle{color}. The applied patch does not increase the
total number of checkstyle errors
{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:green}+1 lineLengths{color}. The patch does not introduce lines
longer than 100
{color:green}+1 site{color}. The mvn post-site goal succeeds with this patch.
{color:green}+1 core tests{color}. The patch passed unit tests in .
Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/14616//testReport/
Release Findbugs (version 2.0.3) warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/14616//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors:
https://builds.apache.org/job/PreCommit-HBASE-Build/14616//artifact/patchprocess/checkstyle-aggregate.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/14616//console
This message is automatically generated.
> WALs may be deleted before they are replicated to peers
> -------------------------------------------------------
>
> Key: HBASE-12865
> URL: https://issues.apache.org/jira/browse/HBASE-12865
> Project: HBase
> Issue Type: Bug
> Components: Replication
> Reporter: Liu Shaohui
> Assignee: He Liangliang
> Priority: Critical
> Attachments: HBASE-12865-V1.diff
>
>
> By design, ReplicationLogCleaner guarantee that the WALs being in
> replication queue can't been deleted by the HMaster. The
> ReplicationLogCleaner gets the WAL set from zookeeper by scanning the
> replication zk node. But it may get uncompleted WAL set during replication
> failover for the scan operation is not atomic.
> For example: There are three region servers: rs1, rs2, rs3, and peer id 10.
> The layout of replication zookeeper nodes is:
> {code}
> /hbase/replication/rs/rs1/10/wals
> /rs2/10/wals
> /rs3/10/wals
> {code}
> - t1: the ReplicationLogCleaner finished scanning the replication queue of
> rs1, and start to scan the queue of rs2.
> - t2: region server rs3 is down, and rs1 take over rs3's replication queue.
> The new layout is
> {code}
> /hbase/replication/rs/rs1/10/wals
> /rs1/10-rs3/wals
> /rs2/10/wals
> /rs3
> {code}
> - t3, the ReplicationLogCleaner finished scanning the queue of rs2, and start
> to scan the node of rs3. But the the queue has been moved to
> "replication/rs1/10-rs3/WALS"
> So the ReplicationLogCleaner will miss the WALs of rs3 in peer 10 and the
> hmaster may delete these WALs before they are replicated to peer clusters.
> We encountered this problem in our cluster and I think it's a serious bug for
> replication.
> Suggestions are welcomed to fix this bug. thx~
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)