GitHub user nabarunnag opened a pull request:

    https://github.com/apache/geode/pull/732

    GEODE-3276: Managing race conditions while the senders are stopped

        * When a connection is initialized, a readAckThread may be alive from a 
previous incarnation.
        * This AckThread will be stuck on a read socket with no timeout as 
nothing was dispatched.
        * Also while it was stuck on the read, it will hold a connection 
lifecycle read lock
        * The initialize connection needs a connection life cycle write lock to 
start the connection but the read lock is held by the ack thread.
        * This results in a deadlock and eventually a hang.
        * Another situation is that we set the flag isStopped for the event 
processor before actually shutting down the diapatcher and ack thread.
        * So after the flag is set and before actually shutting down the 
dispatcher and ackThread, a gateway proxy stomper thread gets in between these 
two steps of execution.
        * The stomper thread checks the isStopped flag, which was set to true, 
and proceeds to destroy the connection pool. However the dispatcher and 
ackThread were still running.
        * This results in a out of heap memory exception while the ack thread 
is reading from the socket while connection pool was destroyed.
        * To solve this issue, the stomper thread checks if the event processor 
and dispatcher exists, if true then we close the input streams before 
destroying the connection pool.
    
    Thank you for submitting a contribution to Apache Geode.
    
    In order to streamline the review of the contribution we ask you
    to ensure the following steps have been taken:
    
    ### For all changes:
    - [ ] Is there a JIRA ticket associated with this PR? Is it referenced in 
the commit message?
    
    - [ ] Has your PR been rebased against the latest commit within the target 
branch (typically `develop`)?
    
    - [ ] Is your initial contribution a single, squashed commit?
    
    - [ ] Does `gradlew build` run cleanly?
    
    - [ ] Have you written or updated unit tests to verify your changes?
    
    - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
    
    ### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build 
issues and
    submit an update to your PR as soon as possible. If you need help, please 
send an
    email to dev@geode.apache.org.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nabarunnag/incubator-geode feature/GEODE-3276

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/geode/pull/732.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #732
    
----
commit 9618f69d2620f5a3d3a6a8576631906a61b512f9
Author: nabarun <n...@pivotal.io>
Date:   2017-08-17T17:51:15Z

    GEODE-3276: Managing race conditions while the senders are stopped
    
        * When a connection is initialized, a readAckThread may be alive from a 
previous incarnation.
        * This AckThread will be stuck on a read socket with no timeout as 
nothing was dispatched.
        * Also while it was stuck on the read, it will hold a connection 
lifecycle read lock
        * The initialize connection needs a connection life cycle write lock to 
start the connection but the read lock is held by the ack thread.
        * This results in a deadlock and eventually a hang.
        * Another situation is that we set the flag isStopped for the event 
processor before actually shutting down the diapatcher and ack thread.
        * So after the flag is set and before actually shutting down the 
dispatcher and ackThread, a gateway proxy stomper thread gets in between these 
two steps of execution.
        * The stomper thread checks the isStopped flag, which was set to true, 
and proceeds to destroy the connection pool. However the dispatcher and 
ackThread were still running.
        * This results in a out of heap memory exception while the ack thread 
is reading from the socket while connection pool was destroyed.
        * To solve this issue, the stomper thread checks if the event processor 
and dispatcher exists, if true then we close the input streams before 
destroying the connection pool.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to