[jira] [Commented] (GEODE-5307) Hang with servers all in waitForPrimaryMember and one server in NO_PRIMARY_HOSTING state

ASF subversion and git services (JIRA) Mon, 11 Jun 2018 13:41:24 -0700


    [ 
https://issues.apache.org/jira/browse/GEODE-5307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508706#comment-16508706
 ]


ASF subversion and git services commented on GEODE-5307:
--------------------------------------------------------

Commit 3ed33a162a11f7f2600f97db4983e820929dd9f3 in geode's branch 
refs/heads/develop from [~bschuchardt]
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=3ed33a1 ]

GEODE-5307 Hang with servers all in waitForPrimaryMember and one server in 
NO_PRIMARY_HOSTING state

Ignore the primaryElector if it is no longer known to the RegionAdvisor.
This means that the elector has somehow gone away - either it crashed,
shut down or destroyed its region.


> Hang with servers all in waitForPrimaryMember and one server in 
> NO_PRIMARY_HOSTING state
> ----------------------------------------------------------------------------------------
>
>                 Key: GEODE-5307
>                 URL: https://issues.apache.org/jira/browse/GEODE-5307
>             Project: Geode
>          Issue Type: Bug
>          Components: regions
>    Affects Versions: 1.1.0, 1.2.0, 1.3.0, 1.2.1, 1.4.0, 1.5.0, 1.6.0
>            Reporter: Bruce Schuchardt
>            Assignee: Bruce Schuchardt
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> I've run into a hang in  a test where servers are continuously creating PRs, 
> doing putAll ops on them and closing/local-destroying the PR.  Sometimes the 
> servers hang with any thread needing a particular bucket in 
> waitingForPrimaryMember().
> This seems to happen because of this sequence of events:
> 1. two servers create a partitioned region
> 2. one server initiates a putAll and requests the other server manage a bucket
> 3. the putAll server closes or locally-destroys its region
> 4. the close() operation completes
> 5. the other server initializes its bucket and still uses the requesting 
> server as a primaryElector. This keeps it from deciding to volunteer to 
> become primary.
> The problem is that the server that closed its region caused exceptions to be 
> thrown in the putAll thread and abandon creation of the bucket. No-one will 
> ever trip the switch that makes the other server become the primary for the 
> bucket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (GEODE-5307) Hang with servers all in waitForPrimaryMember and one server in NO_PRIMARY_HOSTING state

Reply via email to