[ 
https://issues.apache.org/jira/browse/HBASE-25741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17340265#comment-17340265
 ] 

Hudson commented on HBASE-25741:
--------------------------------

Results for branch branch-1
        [build #122 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-1/122/]:
 (x) *{color:red}-1 overall{color}*
----
details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-1/122//General_Nightly_Build_Report/]


(x) {color:red}-1 jdk7 checks{color}
-- For more information [see jdk7 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-1/122//JDK7_Nightly_Build_Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-1/122//JDK8_Nightly_Build_Report_(Hadoop2)/]




(x) {color:red}-1 source release artifact{color}
-- See build output for details.


> Deadlock during peer cleanup with NoNodeException
> -------------------------------------------------
>
>                 Key: HBASE-25741
>                 URL: https://issues.apache.org/jira/browse/HBASE-25741
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 1.7.0
>            Reporter: Sandeep Pal
>            Assignee: Sandeep Pal
>            Priority: Major
>              Labels: regression
>             Fix For: 1.7.0
>
>
> We have observed that replication source metrics for peer exists on some 
> region servers even though peer has been removed.  This is because when we 
> encounter the NoNodeException in ReplicationSource, it calls the 
> `peerRemoved` workflow which should eventually terminate the source and 
> removes the source from the source manager. Now, the problem is 
> ReplicationSource thread terminates itself and thus the action to removePeer 
> is not complete leaving the metrics there forever for source. This is the 
> flow, replication source trying to clean wals 
> [here|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java#L801]
>  and on NoNodeException it calls the 
> [peerRemoved|https://github.com/apache/hbase/blob/b231dd620f107b488b88599e16dc846eb856972c/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java#L244]
>  and terminate the source (itself), leaving the terminated source there in 
> sourcemanager and not clearing it's 
> [metrics|https://github.com/apache/hbase/blob/b231dd620f107b488b88599e16dc846eb856972c/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java#L645].
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to