[
https://issues.apache.org/jira/browse/CURATOR-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856870#comment-16856870
]
Jordan Zimmerman commented on CURATOR-527:
------------------------------------------
FYI - just in case I wrote a quick test to verify. I added these fields to
LockInternals:
{code}
static volatile CountDownLatch debugAfterGetDataLatchWaiter = null;
static volatile CountDownLatch debugAfterGetDataLatchReady = null;
{code}
Then, in LockInternals .internalLockLoop(), I added this:
{code}
client.getData().usingWatcher(watcher).forPath(previousSequencePath);
if ( debugAfterGetDataLatchWaiter != null )
{
debugAfterGetDataLatchReady.countDown();
debugAfterGetDataLatchWaiter.await();
}
{code}
This has the effect of (when the debug latches are set)
{{debugAfterGetDataLatchReady}} will count down after getData is called and the
block on {{debugAfterGetDataLatchWaiter}}.
With this in place, I wrote this test in TestInterProcessMutexBase:
{code}
@Test
public void testDeletionBetweenGetDataAndWait() throws Exception {
final Timing timing = new Timing();
final CuratorFramework client =
CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(),
timing.connection(), new ExponentialBackoffRetry(100, 3));
try
{
client.start();
InterProcessLock lock1 = makeLock(client);
InterProcessLock lock2 = makeLock(client);
CountDownLatch lock1latchReady = new CountDownLatch(1);
CountDownLatch lock1latchWait = new CountDownLatch(1);
CompletableFuture.runAsync((() -> {
try
{
lock1.acquire();
System.out.println("Lock1 acquired - waiting");
lock1latchReady.countDown();
lock1latchWait.await();
}
catch ( Exception e )
{
throw new RuntimeException(e);
}
finally
{
try
{
lock1.release();
System.out.println("Lock1 released");
}
catch ( Exception e )
{
// ignore
}
}
}));
timing.awaitLatch(lock1latchReady);
LockInternals.debugAfterGetDataLatchReady = new CountDownLatch(1);
LockInternals.debugAfterGetDataLatchWaiter = new CountDownLatch(1);
CompletableFuture<Void> lock2Future =
CompletableFuture.runAsync((() -> {
try
{
System.out.println("Pre lock2 acquired");
lock2.acquire();
System.out.println("Post lock2 acquired");
}
catch ( Exception e )
{
throw new RuntimeException(e);
}
}));
System.out.println("Wait a bit 1");
timing.sleepABit();
System.out.println("Tell lock 1 to release");
lock1latchWait.countDown();
System.out.println("Wait a bit 2");
timing.sleepABit();
System.out.println("debugAfterGetDataLatchWaiter count down");
LockInternals.debugAfterGetDataLatchWaiter.countDown();
try
{
lock2Future.get(timing.forWaiting().milliseconds(),
TimeUnit.MILLISECONDS);
}
catch ( Exception e )
{
Assert.fail("lock2 did not acquire");
}
}
finally
{
CloseableUtils.closeQuietly(client);
}
}
{code}
It creates a lock in a thread, holds that lock and waits for it to acquire.
Then, it sets the debug latches, and tries to acquire a second lock on the same
path. It waits for the {{debugAfterGetDataLatchReady}} to be ready thus
ensuring the second lock's thread is blocked after getData and before wait().
It then signals the lock1 thread to release the lock. After waiting a bit, it
counts down the debug latch so that lock2 can proceed. Due to the watcher being
called, as soon as lock2's thread waits, the watcher will notify it and it will
return, thus continuing the lock process.
The test succeeds.
> Concurrency issue in LockInternals
> ----------------------------------
>
> Key: CURATOR-527
> URL: https://issues.apache.org/jira/browse/CURATOR-527
> Project: Apache Curator
> Issue Type: Bug
> Components: Recipes
> Affects Versions: 2.12.0
> Environment: Curator 2.12.0
> zookeeper 3.4.14
> Reporter: Kim Jaechang
> Priority: Major
>
> I'm using InterProcessMutex and InterProcessMutex often failed to acquire
> lock.
> In LockInternals.internalLockLoop(), watcher is registered to zookeeper and
> call wait() like below
> {code:java}
> client.getData().usingWatcher(watcher).forPath(previousSequencePath);
> if ( millisToWait != null )
> {
> millisToWait -= (System.currentTimeMillis() - startMillis);
> startMillis = System.currentTimeMillis();
> if ( millisToWait <= 0 )
> {
> doDelete = true; // timed out - delete our node
> break;
> }
> wait(millisToWait);
> }
> else
> {
> wait();
> }
> {code}
> In my case, my program is waiting
> previousSequencePath=_c_f290140d-9856-42ad-b9bf-348ffc086062-lock-0000000759
> to be deleted.
> But _c_f290140d-9856-42ad-b9bf-348ffc086062-lock-0000000759 is deleted
> between client.getData() and wait().
> if _c_f290140d-9856-42ad-b9bf-348ffc086062-lock-0000000759 is deleted when
> client.getData().usingWatcher(watcher).forPath(previousSequencePath) is
> called, it will throw Exception but it was exist at that time.
> I'm using Curator 2.12.0 but latest version seems to have same issue.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)