[ https://issues.apache.org/jira/browse/IGNITE-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743399#comment-16743399 ]
Oleg Ignatenko edited comment on IGNITE-10518 at 1/15/19 9:48 PM: ------------------------------------------------------------------ (x) Teamcity history for reproducer ([IgniteTxCachePrimarySyncTest.testSingleKeyCommitFromPrimary|https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=4989034880085631279&tab=testDetails]) suggests that problem hasn't been fixed in any imaginable way: I checked last 100 execution results for about 30 days since Dec 16 2018 and all of them without any exception show all the same "muted failure" result: {noformat} Test status Duration Build Info Changes Agent Muted failure 18ms … MVCC Cache 9 pull/5823/head #1023 Tests passed: 10, muted: 9 andrey.mashenk… (2) 14 Jan 19 17:34 publicagent17_9096 Muted failure 12ms … MVCC Cache 9 refs/heads/master #1020 Tests passed: 10, muted: 9 No changes 14 Jan 19 14:10 publicagent07_9092 Muted failure 24ms … MVCC Cache 9 refs/heads/master #1019 Tests passed: 10, muted: 9 No changes 14 Jan 19 13:06 publicagent13_9096 Muted failure 17ms … MVCC Cache 9 refs/heads/master #1018 Tests passed: 10, muted: 9 Changes (2) 14 Jan 19 12:17 publicagent10_9092 Muted failure 18ms … MVCC Cache 9 refs/heads/master #1017 Tests passed: 10, muted: 9 Changes (2) 14 Jan 19 11:16 publicagent14_9096 Muted failure 14ms … MVCC Cache 9 refs/heads/master #1016 Tests passed: 10, muted: 9 No changes 14 Jan 19 10:06 publicagent11_9092 Muted failure 15ms … MVCC Cache 9 refs/heads/master #1015 Tests passed: 10, muted: 9 No changes 14 Jan 19 09:17 publicagent10_9096 Muted failure 12ms … MVCC Cache 9 refs/heads/master #1014 Tests passed: 10, muted: 9 No changes 14 Jan 19 08:28 publicagent11_9092 Muted failure 25ms … MVCC Cache 9 refs/heads/master #1013 Tests passed: 10, muted: 9 No changes 14 Jan 19 07:36 publicagent17_9091 Muted failure 16ms … MVCC Cache 9 refs/heads/master #1012 Tests passed: 10, muted: 9 No changes 14 Jan 19 06:46 publicagent11_9096 Muted failure 26ms … MVCC Cache 9 refs/heads/master #1011 Tests passed: 10, muted: 9 No changes 14 Jan 19 05:56 publicagent09_9094 Muted failure 8ms … MVCC Cache 9 refs/heads/master #1010 Tests passed: 10, muted: 9 No changes 14 Jan 19 05:07 publicagent17_9092 Muted failure 18ms … MVCC Cache 9 refs/heads/master #1009 Tests passed: 10, muted: 9 No changes 14 Jan 19 04:16 publicagent15_9094 Muted failure 18ms … MVCC Cache 9 refs/heads/master #1008 Tests passed: 10, muted: 9 No changes 14 Jan 19 03:26 publicagent16_9096 Muted failure 25ms … MVCC Cache 9 refs/heads/master #1007 Tests passed: 10, muted: 9 No changes 14 Jan 19 01:56 publicagent14_9094 Muted failure 10ms … MVCC Cache 9 refs/heads/master #1006 Tests passed: 10, muted: 9 No changes 14 Jan 19 01:06 publicagent16_9093 Muted failure 20ms … MVCC Cache 9 pull/5814/head #1005 Tests passed: 9, ignored: 1, muted: 9 Oleg Ignatenko (79) 14 Jan 19 00:16 publicagent06_9092 Muted failure 13ms … MVCC Cache 9 refs/heads/master #1004 Tests passed: 10, muted: 9 No changes 13 Jan 19 23:47 publicagent16_9092 ... etc{noformat} ---- I happened to find it out when re-running TC bot to get visa for IGNITE-10796 because I picked unmuted test from master. I re-run MVCC 9 suite several times and every time it failed with execution timeout and it passed only after I suppressed execution of reproducer back again. Typical thread dump I observed from timed out test: {noformat} "sys-stripe-0-#557%distributed.IgniteTxCachePrimarySyncTest0%" #631 prio=5 os_prio=0 tid=0x00007f7861d06000 nid=0x73583 waiting on condition [0x00007f7817af9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) at org.apache.ignite.internal.util.StripedExecutor$StripeConcurrentQueue.take(StripedExecutor.java:672) at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:494) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.java:748){noformat} (i) Reopening the ticket because of above. In case if I am mistaken - [~amashenkov], [~gvvinblade], if you can provide successful teamcity execution results for this test case (or better yet, TC bot visa for this PR) then please feel free to close it again. was (Author: oignatenko): (x) Teamcity history for reproducer ([IgniteTxCachePrimarySyncTest0.testSingleKeyCommitFromPrimary|https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=4989034880085631279&tab=testDetails]) suggests that problem hasn't been fixed in any imaginable way: I checked last 100 execution results for about 30 days since Dec 16 2018 and all of them without any exception show all the same "muted failure" result: {noformat} Test status Duration Build Info Changes Agent Muted failure 18ms … MVCC Cache 9 pull/5823/head #1023 Tests passed: 10, muted: 9 andrey.mashenk… (2) 14 Jan 19 17:34 publicagent17_9096 Muted failure 12ms … MVCC Cache 9 refs/heads/master #1020 Tests passed: 10, muted: 9 No changes 14 Jan 19 14:10 publicagent07_9092 Muted failure 24ms … MVCC Cache 9 refs/heads/master #1019 Tests passed: 10, muted: 9 No changes 14 Jan 19 13:06 publicagent13_9096 Muted failure 17ms … MVCC Cache 9 refs/heads/master #1018 Tests passed: 10, muted: 9 Changes (2) 14 Jan 19 12:17 publicagent10_9092 Muted failure 18ms … MVCC Cache 9 refs/heads/master #1017 Tests passed: 10, muted: 9 Changes (2) 14 Jan 19 11:16 publicagent14_9096 Muted failure 14ms … MVCC Cache 9 refs/heads/master #1016 Tests passed: 10, muted: 9 No changes 14 Jan 19 10:06 publicagent11_9092 Muted failure 15ms … MVCC Cache 9 refs/heads/master #1015 Tests passed: 10, muted: 9 No changes 14 Jan 19 09:17 publicagent10_9096 Muted failure 12ms … MVCC Cache 9 refs/heads/master #1014 Tests passed: 10, muted: 9 No changes 14 Jan 19 08:28 publicagent11_9092 Muted failure 25ms … MVCC Cache 9 refs/heads/master #1013 Tests passed: 10, muted: 9 No changes 14 Jan 19 07:36 publicagent17_9091 Muted failure 16ms … MVCC Cache 9 refs/heads/master #1012 Tests passed: 10, muted: 9 No changes 14 Jan 19 06:46 publicagent11_9096 Muted failure 26ms … MVCC Cache 9 refs/heads/master #1011 Tests passed: 10, muted: 9 No changes 14 Jan 19 05:56 publicagent09_9094 Muted failure 8ms … MVCC Cache 9 refs/heads/master #1010 Tests passed: 10, muted: 9 No changes 14 Jan 19 05:07 publicagent17_9092 Muted failure 18ms … MVCC Cache 9 refs/heads/master #1009 Tests passed: 10, muted: 9 No changes 14 Jan 19 04:16 publicagent15_9094 Muted failure 18ms … MVCC Cache 9 refs/heads/master #1008 Tests passed: 10, muted: 9 No changes 14 Jan 19 03:26 publicagent16_9096 Muted failure 25ms … MVCC Cache 9 refs/heads/master #1007 Tests passed: 10, muted: 9 No changes 14 Jan 19 01:56 publicagent14_9094 Muted failure 10ms … MVCC Cache 9 refs/heads/master #1006 Tests passed: 10, muted: 9 No changes 14 Jan 19 01:06 publicagent16_9093 Muted failure 20ms … MVCC Cache 9 pull/5814/head #1005 Tests passed: 9, ignored: 1, muted: 9 Oleg Ignatenko (79) 14 Jan 19 00:16 publicagent06_9092 Muted failure 13ms … MVCC Cache 9 refs/heads/master #1004 Tests passed: 10, muted: 9 No changes 13 Jan 19 23:47 publicagent16_9092 ... etc{noformat} ---- I happened to find it out when re-running TC bot to get visa for IGNITE-10796 because I picked unmuted test from master. I re-run MVCC 9 suite several times and every time it failed with execution timeout and it passed only after I suppressed execution of reproducer back again. Typical thread dump I observed from timed out test: {noformat} "sys-stripe-0-#557%distributed.IgniteTxCachePrimarySyncTest0%" #631 prio=5 os_prio=0 tid=0x00007f7861d06000 nid=0x73583 waiting on condition [0x00007f7817af9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) at org.apache.ignite.internal.util.StripedExecutor$StripeConcurrentQueue.take(StripedExecutor.java:672) at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:494) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.java:748){noformat} (i) Reopening the ticket because of above. In case if I am mistaken - [~amashenkov], [~gvvinblade], if you can provide successful teamcity execution results for this test case (or better yet, TC bot visa for this PR) then please feel free to close it again. > MVCC: Update operation may hangs on backup on unstable topology. > ----------------------------------------------------------------- > > Key: IGNITE-10518 > URL: https://issues.apache.org/jira/browse/IGNITE-10518 > Project: Ignite > Issue Type: Bug > Components: mvcc > Reporter: Andrew Mashenkov > Assignee: Andrew Mashenkov > Priority: Critical > Labels: Hanging, failover, mvcc_stabilization_stage_1 > Fix For: 2.8 > > Time Spent: 20m > Remaining Estimate: 0h > > Update operation may hangs on backup awaiting next topology. > Symptoms: > # Exchange for topology version 6.1 has been finished. > # Exchange for topology version 6.2 awaits for partition release. > # DhtTxRemote waits for exchange. > Seems, tx maps on outdated topology version. > Reproducer IgniteTxCachePrimarySyncTest.testSingleKeyCommit() in Mvcc mode. -- This message was sent by Atlassian JIRA (v7.6.3#76005)