[ 
https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16910741#comment-16910741
 ] 

ASF subversion and git services commented on GEODE-3780:
--------------------------------------------------------

Commit 9975d1e10a905b040edeefa0ecb2210d1a1c1525 in geode's branch 
refs/heads/feature/merge_geode_3780 from Bruce Schuchardt
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=9975d1e ]

GEODE-3780 suspected member is never watched again after passing final check 
(#3917)

* GEODE-3780 suspected member is never watched again after passing final check

After passing a "final check" a member will be subject to suspect
processing again but we weren't processing the suspect message locally.
This caused JoinLeave to never be notified of the suspect so that
removal could be initiated.

I also noticed that a method in HealthMonitor was misnamed.  It claimed
to return the set of members that had failed availability checks but
instead it was returning a set of members currently under suspicion.  I
renamed the method for clarity.

* empty commit

* removing getSuspectMembers - it could kick out a suspect member too easily

* removing unused method and commented-out code

* revising test

(cherry picked from commit 8e9b04470264983d0aa1c7900f6e9be2374549d9)


> suspected member is never watched again after passing final check
> -----------------------------------------------------------------
>
>                 Key: GEODE-3780
>                 URL: https://issues.apache.org/jira/browse/GEODE-3780
>             Project: Geode
>          Issue Type: Bug
>          Components: membership
>            Reporter: Bruce Schuchardt
>            Assignee: Bruce Schuchardt
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.11.0
>
>          Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In a network-down test we saw a node on the losing side of the network 
> partition perform final checks on members on the winning side.  One of the 
> final checks mysteriously succeeded
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 <Geode Failure 
> Detection thread 4> tid=0x128] Final check failed but detected recent message 
> traffic for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135)<v2>:1026
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 <Geode Failure 
> Detection thread 4> tid=0x128] Final check passed for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135)<v2>:1026
> After this the suspected member was never checked again and the losing side 
> failed to shut down.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to