Juan José Ramos Cassella created GEODE-7062:
-----------------------------------------------

             Summary: CI Failure: 
DistributedLockServiceDUnitTest.testSuspendLockingBlocksUntilNoLocks
                 Key: GEODE-7062
                 URL: https://issues.apache.org/jira/browse/GEODE-7062
             Project: Geode
          Issue Type: Bug
          Components: tests
            Reporter: Juan José Ramos Cassella


The test {{testSuspendLockingBlocksUntilNoLocks}} from class 
{{DistributedLockServiceDUnitTest}} failed twice in CI runs 
[967|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/967]
 and 
[969|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/969].
Results for the first failure are available 
[here|http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0015/test-results/distributedTest/1565222926/]
 and for the second one 
[here|http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0015/test-results/distributedTest/1565246507/].
Archived artifacts for the first failure are available 
[here|http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0015/test-artifacts/1565222926/distributedtestfiles-OpenJDK8-1.11.0-SNAPSHOT.0015.tgz]
 and for the second one 
[here|http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0015/test-artifacts/1565246507/distributedtestfiles-OpenJDK8-1.11.0-SNAPSHOT.0015.tgz].

The issue appears to be a race condition while firing an asynchronous thread on 
a remote {{VM}} through the following code:
{code:title=DistributedLockServiceDUnitTest.java|borderStyle=solid}
    VM vm1 = getVM(1);
    vm1.invokeAsync(new SerializableRunnable("Lock & unlock in vm1") {
      @Override
      public void run() {
        DistributedLockService service2 = getServiceNamed(name);
        assertThat(service2.lock("lock", -1, -1)).isTrue();
        synchronized (monitor) {
          try {
            monitor.wait();
          } catch (InterruptedException ex) {
            out.println("Unexpected InterruptedException");
            fail("interrupted");
          }
        }
        service2.unlock("lock");
      }
    });
    // Let vm1's thread get the lock and go into wait()
    sleep(100);
{code}

If the thread is not launched on the remote {{VM}} after sleeping for 100 
milliseconds, the test will fail as the thread on the local {{VM}} will be able 
to invoke {{suspendLocking}} right away:
{code:title=DistributedLockServiceDUnitTest.java|borderStyle=solid}
    Thread thread = new Thread(new Runnable() {
      @Override
      public void run() {
        setGot(service.suspendLocking(-1));
        setDone(true);
        service.resumeLocking();
      }
    });
    setGot(false);
    setDone(false);
    thread.start();

    // Let thread start, make sure it's blocked in suspendLocking
    sleep(100);
    assertThat(getGot() || getDone())
        .withFailMessage("Before release, got: " + getGot() + ", done: " + 
getDone()).isFalse();
{code}

Increasing the sleep time might help to reduce possible re occurrences of the 
issue, another option would be to investigate how to make the test wait *unti* 
the asynchronous invocation has been started on the remote {{VM}} instead of 
arbitrarily sleeping 100 milliseconds.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to