[
https://issues.apache.org/jira/browse/IGNITE-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743399#comment-16743399
]
Oleg Ignatenko edited comment on IGNITE-10518 at 1/15/19 9:48 PM:
------------------------------------------------------------------
(x) Teamcity history for reproducer
([IgniteTxCachePrimarySyncTest.testSingleKeyCommitFromPrimary|https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=4989034880085631279&tab=testDetails])
suggests that problem hasn't been fixed in any imaginable way: I checked last
100 execution results for about 30 days since Dec 16 2018 and all of them
without any exception show all the same "muted failure" result:
{noformat}
Test status Duration Build Info Changes Agent
Muted failure 18ms … MVCC Cache 9 pull/5823/head #1023 Tests
passed: 10, muted: 9 andrey.mashenk… (2) 14 Jan 19 17:34
publicagent17_9096
Muted failure 12ms … MVCC Cache 9 refs/heads/master #1020
Tests passed: 10, muted: 9 No changes 14 Jan 19 14:10
publicagent07_9092
Muted failure 24ms … MVCC Cache 9 refs/heads/master #1019
Tests passed: 10, muted: 9 No changes 14 Jan 19 13:06
publicagent13_9096
Muted failure 17ms … MVCC Cache 9 refs/heads/master #1018
Tests passed: 10, muted: 9 Changes (2) 14 Jan 19 12:17
publicagent10_9092
Muted failure 18ms … MVCC Cache 9 refs/heads/master #1017
Tests passed: 10, muted: 9 Changes (2) 14 Jan 19 11:16
publicagent14_9096
Muted failure 14ms … MVCC Cache 9 refs/heads/master #1016
Tests passed: 10, muted: 9 No changes 14 Jan 19 10:06
publicagent11_9092
Muted failure 15ms … MVCC Cache 9 refs/heads/master #1015
Tests passed: 10, muted: 9 No changes 14 Jan 19 09:17
publicagent10_9096
Muted failure 12ms … MVCC Cache 9 refs/heads/master #1014
Tests passed: 10, muted: 9 No changes 14 Jan 19 08:28
publicagent11_9092
Muted failure 25ms … MVCC Cache 9 refs/heads/master #1013
Tests passed: 10, muted: 9 No changes 14 Jan 19 07:36
publicagent17_9091
Muted failure 16ms … MVCC Cache 9 refs/heads/master #1012
Tests passed: 10, muted: 9 No changes 14 Jan 19 06:46
publicagent11_9096
Muted failure 26ms … MVCC Cache 9 refs/heads/master #1011
Tests passed: 10, muted: 9 No changes 14 Jan 19 05:56
publicagent09_9094
Muted failure 8ms … MVCC Cache 9 refs/heads/master #1010
Tests passed: 10, muted: 9 No changes 14 Jan 19 05:07
publicagent17_9092
Muted failure 18ms … MVCC Cache 9 refs/heads/master #1009
Tests passed: 10, muted: 9 No changes 14 Jan 19 04:16
publicagent15_9094
Muted failure 18ms … MVCC Cache 9 refs/heads/master #1008
Tests passed: 10, muted: 9 No changes 14 Jan 19 03:26
publicagent16_9096
Muted failure 25ms … MVCC Cache 9 refs/heads/master #1007
Tests passed: 10, muted: 9 No changes 14 Jan 19 01:56
publicagent14_9094
Muted failure 10ms … MVCC Cache 9 refs/heads/master #1006
Tests passed: 10, muted: 9 No changes 14 Jan 19 01:06
publicagent16_9093
Muted failure 20ms … MVCC Cache 9 pull/5814/head #1005 Tests
passed: 9, ignored: 1, muted: 9 Oleg Ignatenko (79) 14 Jan 19 00:16
publicagent06_9092
Muted failure 13ms … MVCC Cache 9 refs/heads/master #1004
Tests passed: 10, muted: 9 No changes 13 Jan 19 23:47
publicagent16_9092
... etc{noformat}
----
I happened to find it out when re-running TC bot to get visa for IGNITE-10796
because I picked unmuted test from master. I re-run MVCC 9 suite several times
and every time it failed with execution timeout and it passed only after I
suppressed execution of reproducer back again.
Typical thread dump I observed from timed out test:
{noformat}
"sys-stripe-0-#557%distributed.IgniteTxCachePrimarySyncTest0%" #631 prio=5
os_prio=0 tid=0x00007f7861d06000 nid=0x73583 waiting on condition
[0x00007f7817af9000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at
org.apache.ignite.internal.util.StripedExecutor$StripeConcurrentQueue.take(StripedExecutor.java:672)
at
org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:494)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748){noformat}
(i) Reopening the ticket because of above. In case if I am mistaken -
[~amashenkov], [~gvvinblade], if you can provide successful teamcity execution
results for this test case (or better yet, TC bot visa for this PR) then please
feel free to close it again.
was (Author: oignatenko):
(x) Teamcity history for reproducer
([IgniteTxCachePrimarySyncTest0.testSingleKeyCommitFromPrimary|https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=4989034880085631279&tab=testDetails])
suggests that problem hasn't been fixed in any imaginable way: I checked last
100 execution results for about 30 days since Dec 16 2018 and all of them
without any exception show all the same "muted failure" result:
{noformat}
Test status Duration Build Info Changes Agent
Muted failure 18ms … MVCC Cache 9 pull/5823/head #1023 Tests
passed: 10, muted: 9 andrey.mashenk… (2) 14 Jan 19 17:34
publicagent17_9096
Muted failure 12ms … MVCC Cache 9 refs/heads/master #1020
Tests passed: 10, muted: 9 No changes 14 Jan 19 14:10
publicagent07_9092
Muted failure 24ms … MVCC Cache 9 refs/heads/master #1019
Tests passed: 10, muted: 9 No changes 14 Jan 19 13:06
publicagent13_9096
Muted failure 17ms … MVCC Cache 9 refs/heads/master #1018
Tests passed: 10, muted: 9 Changes (2) 14 Jan 19 12:17
publicagent10_9092
Muted failure 18ms … MVCC Cache 9 refs/heads/master #1017
Tests passed: 10, muted: 9 Changes (2) 14 Jan 19 11:16
publicagent14_9096
Muted failure 14ms … MVCC Cache 9 refs/heads/master #1016
Tests passed: 10, muted: 9 No changes 14 Jan 19 10:06
publicagent11_9092
Muted failure 15ms … MVCC Cache 9 refs/heads/master #1015
Tests passed: 10, muted: 9 No changes 14 Jan 19 09:17
publicagent10_9096
Muted failure 12ms … MVCC Cache 9 refs/heads/master #1014
Tests passed: 10, muted: 9 No changes 14 Jan 19 08:28
publicagent11_9092
Muted failure 25ms … MVCC Cache 9 refs/heads/master #1013
Tests passed: 10, muted: 9 No changes 14 Jan 19 07:36
publicagent17_9091
Muted failure 16ms … MVCC Cache 9 refs/heads/master #1012
Tests passed: 10, muted: 9 No changes 14 Jan 19 06:46
publicagent11_9096
Muted failure 26ms … MVCC Cache 9 refs/heads/master #1011
Tests passed: 10, muted: 9 No changes 14 Jan 19 05:56
publicagent09_9094
Muted failure 8ms … MVCC Cache 9 refs/heads/master #1010
Tests passed: 10, muted: 9 No changes 14 Jan 19 05:07
publicagent17_9092
Muted failure 18ms … MVCC Cache 9 refs/heads/master #1009
Tests passed: 10, muted: 9 No changes 14 Jan 19 04:16
publicagent15_9094
Muted failure 18ms … MVCC Cache 9 refs/heads/master #1008
Tests passed: 10, muted: 9 No changes 14 Jan 19 03:26
publicagent16_9096
Muted failure 25ms … MVCC Cache 9 refs/heads/master #1007
Tests passed: 10, muted: 9 No changes 14 Jan 19 01:56
publicagent14_9094
Muted failure 10ms … MVCC Cache 9 refs/heads/master #1006
Tests passed: 10, muted: 9 No changes 14 Jan 19 01:06
publicagent16_9093
Muted failure 20ms … MVCC Cache 9 pull/5814/head #1005 Tests
passed: 9, ignored: 1, muted: 9 Oleg Ignatenko (79) 14 Jan 19 00:16
publicagent06_9092
Muted failure 13ms … MVCC Cache 9 refs/heads/master #1004
Tests passed: 10, muted: 9 No changes 13 Jan 19 23:47
publicagent16_9092
... etc{noformat}
----
I happened to find it out when re-running TC bot to get visa for IGNITE-10796
because I picked unmuted test from master. I re-run MVCC 9 suite several times
and every time it failed with execution timeout and it passed only after I
suppressed execution of reproducer back again.
Typical thread dump I observed from timed out test:
{noformat}
"sys-stripe-0-#557%distributed.IgniteTxCachePrimarySyncTest0%" #631 prio=5
os_prio=0 tid=0x00007f7861d06000 nid=0x73583 waiting on condition
[0x00007f7817af9000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at
org.apache.ignite.internal.util.StripedExecutor$StripeConcurrentQueue.take(StripedExecutor.java:672)
at
org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:494)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748){noformat}
(i) Reopening the ticket because of above. In case if I am mistaken -
[~amashenkov], [~gvvinblade], if you can provide successful teamcity execution
results for this test case (or better yet, TC bot visa for this PR) then please
feel free to close it again.
> MVCC: Update operation may hangs on backup on unstable topology.
> -----------------------------------------------------------------
>
> Key: IGNITE-10518
> URL: https://issues.apache.org/jira/browse/IGNITE-10518
> Project: Ignite
> Issue Type: Bug
> Components: mvcc
> Reporter: Andrew Mashenkov
> Assignee: Andrew Mashenkov
> Priority: Critical
> Labels: Hanging, failover, mvcc_stabilization_stage_1
> Fix For: 2.8
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Update operation may hangs on backup awaiting next topology.
> Symptoms:
> # Exchange for topology version 6.1 has been finished.
> # Exchange for topology version 6.2 awaits for partition release.
> # DhtTxRemote waits for exchange.
> Seems, tx maps on outdated topology version.
> Reproducer IgniteTxCachePrimarySyncTest.testSingleKeyCommit() in Mvcc mode.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)