Juan José Ramos Cassella created GEODE-7062:
-----------------------------------------------
Summary: CI Failure:
DistributedLockServiceDUnitTest.testSuspendLockingBlocksUntilNoLocks
Key: GEODE-7062
URL: https://issues.apache.org/jira/browse/GEODE-7062
Project: Geode
Issue Type: Bug
Components: tests
Reporter: Juan José Ramos Cassella
The test {{testSuspendLockingBlocksUntilNoLocks}} from class
{{DistributedLockServiceDUnitTest}} failed twice in CI runs
[967|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/967]
and
[969|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/969].
Results for the first failure are available
[here|http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0015/test-results/distributedTest/1565222926/]
and for the second one
[here|http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0015/test-results/distributedTest/1565246507/].
Archived artifacts for the first failure are available
[here|http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0015/test-artifacts/1565222926/distributedtestfiles-OpenJDK8-1.11.0-SNAPSHOT.0015.tgz]
and for the second one
[here|http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0015/test-artifacts/1565246507/distributedtestfiles-OpenJDK8-1.11.0-SNAPSHOT.0015.tgz].
The issue appears to be a race condition while firing an asynchronous thread on
a remote {{VM}} through the following code:
{code:title=DistributedLockServiceDUnitTest.java|borderStyle=solid}
VM vm1 = getVM(1);
vm1.invokeAsync(new SerializableRunnable("Lock & unlock in vm1") {
@Override
public void run() {
DistributedLockService service2 = getServiceNamed(name);
assertThat(service2.lock("lock", -1, -1)).isTrue();
synchronized (monitor) {
try {
monitor.wait();
} catch (InterruptedException ex) {
out.println("Unexpected InterruptedException");
fail("interrupted");
}
}
service2.unlock("lock");
}
});
// Let vm1's thread get the lock and go into wait()
sleep(100);
{code}
If the thread is not launched on the remote {{VM}} after sleeping for 100
milliseconds, the test will fail as the thread on the local {{VM}} will be able
to invoke {{suspendLocking}} right away:
{code:title=DistributedLockServiceDUnitTest.java|borderStyle=solid}
Thread thread = new Thread(new Runnable() {
@Override
public void run() {
setGot(service.suspendLocking(-1));
setDone(true);
service.resumeLocking();
}
});
setGot(false);
setDone(false);
thread.start();
// Let thread start, make sure it's blocked in suspendLocking
sleep(100);
assertThat(getGot() || getDone())
.withFailMessage("Before release, got: " + getGot() + ", done: " +
getDone()).isFalse();
{code}
Increasing the sleep time might help to reduce possible re occurrences of the
issue, another option would be to investigate how to make the test wait *unti*
the asynchronous invocation has been started on the remote {{VM}} instead of
arbitrarily sleeping 100 milliseconds.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)