Wow - 3 successful runs in a row on Jenkins. That's something we haven't seen with Curator for years. This is looking good.
> On Dec 14, 2018, at 7:52 AM, Jordan Zimmerman <[email protected]> > wrote: > > Here's the PR for this: > > https://github.com/apache/curator/pull/297 > > I've run the tests a few times and it seems good. Another pair of eyes or two > would be appreciated. > > -Jordan > >> On Dec 13, 2018, at 2:09 PM, Jordan Zimmerman <[email protected]> >> wrote: >> >> FYI >> >> I was trying to figure out why TestInterProcessSemaphoreMutex is so flakey. >> In debugging the problem I've come across a fairly serious edge case in how >> several of our recipes work. You can see the issue in >> InterProcessSemaphoreV2 (which is what InterProcessSemaphoreMutex uses >> internally). Have a look here: >> >> >> https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/InterProcessSemaphoreV2.java#L373 >> >> The code synchronizes and then does "client.getChildren()...". This is where >> the problem is. If there are connection problems inside of getChildren() the >> retry policy will do configured sleeping, retries, etc. Importantly, this is >> all done while the thread doing the retires holds InterProcessSemaphoreV2's >> monitor. If the ZK connection is repaired past the session timeout, ZK will >> eventually call InterProcessSemaphoreV2's watcher with an Expired message. >> InterProcessSemaphoreV2's watcher calls this method: >> >> private synchronized void notifyFromWatcher() >> { >> notifyAll(); >> } >> >> You can see that this is a race. The thread doing "getChildren" is holding >> the monitor and is in a retry loop waiting for the connection to be >> repaired. However, ZK's event loop is trying to obtain that same monitor as >> a result of trying to call the synchronized notifyFromWatcher(). This means >> that the retry policy will always fail because ZK's event loop is tied up >> until that thread exists. Worse still, if someone were to use a retry policy >> of "RetryForever" they'd have a deadlock. >> >> This pattern is in about 10 files or so. I'm trying to think of a >> workaround. One possibility is to use a separate thread for this type of >> notification. i.e. notifyFromWatcher() would just signal another thread that >> the notifyAll() needs to be called. This would unblock ZK's event thread so >> that things can progress. I'll play around with this. >> >> If anyone has a better idea or thoughts I'd appreciate it. >> >> -Jordan >> >
