[ 
https://issues.apache.org/jira/browse/CURATOR-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856870#comment-16856870
 ] 

Jordan Zimmerman commented on CURATOR-527:
------------------------------------------

FYI - just in case I wrote a quick test to verify. I added these fields to 
LockInternals:

{code}
    static volatile CountDownLatch debugAfterGetDataLatchWaiter = null;
    static volatile CountDownLatch debugAfterGetDataLatchReady = null;
{code}

Then, in LockInternals .internalLockLoop(), I added this:

{code}
                            
client.getData().usingWatcher(watcher).forPath(previousSequencePath);

                            if ( debugAfterGetDataLatchWaiter != null )
                            {
                                debugAfterGetDataLatchReady.countDown();
                                debugAfterGetDataLatchWaiter.await();
                            }
{code}

This has the effect of (when the debug latches are set) 
{{debugAfterGetDataLatchReady}} will count down after getData is called and the 
block on {{debugAfterGetDataLatchWaiter}}. 

With this in place, I wrote this test in TestInterProcessMutexBase:

{code}
    @Test
    public void testDeletionBetweenGetDataAndWait() throws Exception {
        final Timing timing = new Timing();
        final CuratorFramework client = 
CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), 
timing.connection(), new ExponentialBackoffRetry(100, 3));
        try
        {
            client.start();

            InterProcessLock lock1 = makeLock(client);
            InterProcessLock lock2 = makeLock(client);

            CountDownLatch lock1latchReady = new CountDownLatch(1);
            CountDownLatch lock1latchWait = new CountDownLatch(1);
            CompletableFuture.runAsync((() -> {
                try
                {
                    lock1.acquire();
                    System.out.println("Lock1 acquired - waiting");
                    lock1latchReady.countDown();
                    lock1latchWait.await();
                }
                catch ( Exception e )
                {
                    throw new RuntimeException(e);
                }
                finally
                {
                    try
                    {
                        lock1.release();
                        System.out.println("Lock1 released");
                    }
                    catch ( Exception e )
                    {
                        // ignore
                    }
                }
            }));

            timing.awaitLatch(lock1latchReady);

            LockInternals.debugAfterGetDataLatchReady = new CountDownLatch(1);
            LockInternals.debugAfterGetDataLatchWaiter = new CountDownLatch(1);

            CompletableFuture<Void> lock2Future = 
CompletableFuture.runAsync((() -> {
                try
                {
                    System.out.println("Pre lock2 acquired");
                    lock2.acquire();
                    System.out.println("Post lock2 acquired");
                }
                catch ( Exception e )
                {
                    throw new RuntimeException(e);
                }
            }));

            System.out.println("Wait a bit 1");
            timing.sleepABit();
            System.out.println("Tell lock 1 to release");
            lock1latchWait.countDown();
            System.out.println("Wait a bit 2");
            timing.sleepABit();
            System.out.println("debugAfterGetDataLatchWaiter count down");
            LockInternals.debugAfterGetDataLatchWaiter.countDown();

            try
            {
                lock2Future.get(timing.forWaiting().milliseconds(), 
TimeUnit.MILLISECONDS);
            }
            catch ( Exception e )
            {
                Assert.fail("lock2 did not acquire");
            }
        }
        finally
        {
            CloseableUtils.closeQuietly(client);
        }
    }
{code}

It creates a lock in a thread, holds that lock and waits for it to acquire. 
Then, it sets the debug latches, and tries to acquire a second lock on the same 
path. It waits for the {{debugAfterGetDataLatchReady}} to be ready thus 
ensuring the second lock's thread is blocked after getData and before wait(). 
It then signals the lock1 thread to release the lock. After waiting a bit, it 
counts down the debug latch so that lock2 can proceed. Due to the watcher being 
called, as soon as lock2's thread waits, the watcher will notify it and it will 
return, thus continuing the lock process. 

The test succeeds.

> Concurrency issue in LockInternals
> ----------------------------------
>
>                 Key: CURATOR-527
>                 URL: https://issues.apache.org/jira/browse/CURATOR-527
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Recipes
>    Affects Versions: 2.12.0
>         Environment: Curator 2.12.0
> zookeeper 3.4.14
>            Reporter: Kim Jaechang
>            Priority: Major
>
> I'm using InterProcessMutex and InterProcessMutex often failed to acquire 
> lock.
> In LockInternals.internalLockLoop(), watcher is registered to zookeeper and 
> call wait() like below
> {code:java}
> client.getData().usingWatcher(watcher).forPath(previousSequencePath);
> if ( millisToWait != null )
> {
>     millisToWait -= (System.currentTimeMillis() - startMillis);
>     startMillis = System.currentTimeMillis();
>     if ( millisToWait <= 0 )
>     {
>         doDelete = true;    // timed out - delete our node
>         break;
>     }
>     wait(millisToWait);
> }
> else
> {
>     wait();
> }
> {code}
> In my case, my program is waiting 
> previousSequencePath=_c_f290140d-9856-42ad-b9bf-348ffc086062-lock-0000000759 
> to be deleted.
> But _c_f290140d-9856-42ad-b9bf-348ffc086062-lock-0000000759 is deleted 
> between client.getData() and wait().
> if _c_f290140d-9856-42ad-b9bf-348ffc086062-lock-0000000759 is deleted when 
> client.getData().usingWatcher(watcher).forPath(previousSequencePath) is 
> called, it will throw Exception but it was exist at that time.
> I'm using Curator 2.12.0 but latest version seems to have same issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to