[
https://issues.apache.org/jira/browse/IGNITE-11087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954060#comment-16954060
]
Nikolai Kulagin commented on IGNITE-11087:
------------------------------------------
Because the test task is very short, CheckpointRequestListener catches a
message about saving a checkpoint, and the #onSessionEnd method after the task
is finished, work simultaneously. In one moment task node add sessionId in
closedSess map, and the listener finds sessionId in map. Task node removes the
key from keymap for this session and removes checkpoint for this key.
{code:java}
closedSess.add(ses.getId());
// If on task node.
if (ses.getJobId() == null) {
Set<String> keys = keyMap.remove(ses.getId());
if (keys != null) {
for (String key : keys)
getSpi(ses.getCheckpointSpi()).removeCheckpoint(key);
}
}{code}
Listener removes the key from keymap and removes checkpoint too (even if the
key was not in the map).
{code:java}
if (closedSess.contains(sesId)) {
keyMap.remove(sesId, keys);
getSpi(req.getCheckpointSpi()).removeCheckpoint(req.getKey());
}{code}
For bugfix need add listener's check for contains key in keymap before removing
key. And delete the checkpoint only if the key is found.
{code:java}
if (closedSess.contains(sesId)) {
if (keyMap.remove(sesId, keys))
getSpi(req.getCheckpointSpi()).removeCheckpoint(req.getKey());
}
{code}
After fixing a new bug appears.
Between create new keySet and add checkpoint key in the listener,
{code:java}
Set<String> old = keyMap.putIfAbsent(sesId, (CheckpointSet)(keys = new
CheckpointSet(ses)));
if (old != null)
keys = old;
}
<-------------- here
keys.add(req.getKey());
{code}
task node adds a session in closedSess map, remove empty keySet for session,
but not found no one key (because the listener has not added key yet), and
don't remove checkpoint.
{code:java}
Set<String> keys = keyMap.remove(ses.getId());
if (keys != null) {
for (String key : keys){code}
Listener after added key did not find this key in keyMap, and did not remove
checkpoint.
{code:java}
if (closedSess.contains(sesId)) {
if (keyMap.remove(sesId, keys)){code}
> GridJobCheckpointCleanupSelfTest.testCheckpointCleanup is flaky
> ---------------------------------------------------------------
>
> Key: IGNITE-11087
> URL: https://issues.apache.org/jira/browse/IGNITE-11087
> Project: Ignite
> Issue Type: Bug
> Reporter: Nikolai Kulagin
> Assignee: Nikolai Kulagin
> Priority: Minor
> Labels: MakeTeamcityGreenAgain
> Attachments: #removeCheckpoint is called once more.txt,
> #removeCheckpoint isn't called.txt
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> The method of remove a checkpoint is sometimes not called or is called once
> more. Test has a very low fail rate, 1 per 366 runs on
> [TeamCity|https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-7655052229521669617&tab=testDetails&branch_IgniteTests24Java8=%3Cdefault%3E]
> and 1 per 412 on TC Bot. On local machine approximately 1 failure per 100
> runs. Logs in the attachment.
> Test is flaky for a long time. Before replacing IP Finder in IGNITE-10555,
> test was slower, which made fail rate even less.
>
> {code:java}
> [2019-01-25 14:49:03,050][ERROR][main][root] Test failed.
> junit.framework.AssertionFailedError: expected:<1> but was:<0>
> at junit.framework.Assert.fail(Assert.java:57)
> at junit.framework.Assert.failNotEquals(Assert.java:329)
> at junit.framework.Assert.assertEquals(Assert.java:78)
> at junit.framework.Assert.assertEquals(Assert.java:234)
> at junit.framework.Assert.assertEquals(Assert.java:241)
> at
> org.apache.ignite.internal.GridJobCheckpointCleanupSelfTest.testCheckpointCleanup(GridJobCheckpointCleanupSelfTest.java:88)
> at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at
> org.apache.ignite.testframework.junits.GridAbstractTest$6.run(GridAbstractTest.java:2088)
> at java.lang.Thread.run(Thread.java:748){code}
>
> [^#removeCheckpoint isn't called.txt]
> ^_____________________________________________________________________^
>
> {code:java}
> [2019-01-25 14:50:03,282][ERROR][main][root] Test failed.
> junit.framework.AssertionFailedError: expected:<-1> but was:<0>
> at junit.framework.Assert.fail(Assert.java:57)
> at junit.framework.Assert.failNotEquals(Assert.java:329)
> at junit.framework.Assert.assertEquals(Assert.java:78)
> at junit.framework.Assert.assertEquals(Assert.java:234)
> at junit.framework.Assert.assertEquals(Assert.java:241)
> at
> org.apache.ignite.internal.GridJobCheckpointCleanupSelfTest.testCheckpointCleanup(GridJobCheckpointCleanupSelfTest.java:88)
> at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at
> org.apache.ignite.testframework.junits.GridAbstractTest$6.run(GridAbstractTest.java:2088)
> at java.lang.Thread.run(Thread.java:748){code}
> [^#removeCheckpoint is called once more.txt]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)