[ 
https://issues.apache.org/jira/browse/GEODE-5307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508331#comment-16508331
 ] 

ASF subversion and git services commented on GEODE-5307:
--------------------------------------------------------

Commit c359ff24b7b812f58e66e60d960c8b1404407795 in geode's branch 
refs/heads/feature/GEODE-5307 from [~bschuchardt]
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=c359ff2 ]

GEODE-5307 Hang with servers all in waitForPrimaryMember

Precheckin testing showed that the original commit on this branch broke
partitioning because it wasn't checking to see if the primaryElector
was the member creating the bucket.  In that case there is no Profile in
the RegionAdvisor, so the check would allow BucketAdvisor.volunteerForPrimary()
to proceed.  The fix is to check to see if that's the case.


> Hang with servers all in waitForPrimaryMember and one server in 
> NO_PRIMARY_HOSTING state
> ----------------------------------------------------------------------------------------
>
>                 Key: GEODE-5307
>                 URL: https://issues.apache.org/jira/browse/GEODE-5307
>             Project: Geode
>          Issue Type: Bug
>          Components: regions
>    Affects Versions: 1.1.0, 1.2.0, 1.3.0, 1.2.1, 1.4.0, 1.5.0, 1.6.0
>            Reporter: Bruce Schuchardt
>            Assignee: Bruce Schuchardt
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> I've run into a hang in  a test where servers are continuously creating PRs, 
> doing putAll ops on them and closing/local-destroying the PR.  Sometimes the 
> servers hang with any thread needing a particular bucket in 
> waitingForPrimaryMember().
> This seems to happen because of this sequence of events:
> 1. two servers create a partitioned region
> 2. one server initiates a putAll and requests the other server manage a bucket
> 3. the putAll server closes or locally-destroys its region
> 4. the close() operation completes
> 5. the other server initializes its bucket and still uses the requesting 
> server as a primaryElector. This keeps it from deciding to volunteer to 
> become primary.
> The problem is that the server that closed its region caused exceptions to be 
> thrown in the putAll thread and abandon creation of the bucket. No-one will 
> ever trip the switch that makes the other server become the primary for the 
> bucket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to