Here's the PR for this:
https://github.com/apache/curator/pull/297
I've run the tests a few times and it seems good. Another pair of eyes or two
would be appreciated.
-Jordan
> On Dec 13, 2018, at 2:09 PM, Jordan Zimmerman <[email protected]>
> wrote:
>
> FYI
>
> I was trying to figure out why TestInterProcessSemaphoreMutex is so flakey.
> In debugging the problem I've come across a fairly serious edge case in how
> several of our recipes work. You can see the issue in InterProcessSemaphoreV2
> (which is what InterProcessSemaphoreMutex uses internally). Have a look here:
>
>
> https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/InterProcessSemaphoreV2.java#L373
>
> The code synchronizes and then does "client.getChildren()...". This is where
> the problem is. If there are connection problems inside of getChildren() the
> retry policy will do configured sleeping, retries, etc. Importantly, this is
> all done while the thread doing the retires holds InterProcessSemaphoreV2's
> monitor. If the ZK connection is repaired past the session timeout, ZK will
> eventually call InterProcessSemaphoreV2's watcher with an Expired message.
> InterProcessSemaphoreV2's watcher calls this method:
>
> private synchronized void notifyFromWatcher()
> {
> notifyAll();
> }
>
> You can see that this is a race. The thread doing "getChildren" is holding
> the monitor and is in a retry loop waiting for the connection to be repaired.
> However, ZK's event loop is trying to obtain that same monitor as a result of
> trying to call the synchronized notifyFromWatcher(). This means that the
> retry policy will always fail because ZK's event loop is tied up until that
> thread exists. Worse still, if someone were to use a retry policy of
> "RetryForever" they'd have a deadlock.
>
> This pattern is in about 10 files or so. I'm trying to think of a workaround.
> One possibility is to use a separate thread for this type of notification.
> i.e. notifyFromWatcher() would just signal another thread that the
> notifyAll() needs to be called. This would unblock ZK's event thread so that
> things can progress. I'll play around with this.
>
> If anyone has a better idea or thoughts I'd appreciate it.
>
> -Jordan
>