[
https://issues.apache.org/jira/browse/IGNITE-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651671#comment-16651671
]
Alexey Goncharuk commented on IGNITE-9756:
------------------------------------------
[~xtern], if I remember correctly, we added deduplication of the eviction
requests to avoid very large queue sizes because we attempt to clear partition
after each transaction commit. I think we can add a correct synchronization
here - add the task to the queue and the set under the mutex, and remove the
task from the set before executing the task under the same mutex. This should
keep the deduplication logic and fix the logic you described.
> [Test Failed] IgniteCacheIncrementTxTest.testIncrementTxTopologyChange2 fails
> sometimes in master.
> --------------------------------------------------------------------------------------------------
>
> Key: IGNITE-9756
> URL: https://issues.apache.org/jira/browse/IGNITE-9756
> Project: Ignite
> Issue Type: Bug
> Affects Versions: 2.6
> Reporter: Pavel Pereslegin
> Assignee: Pavel Pereslegin
> Priority: Major
> Labels: MakeTeamcityGreenAgain
> Fix For: 2.8
>
>
> IgniteCacheIncrementTxTest.testIncrementTxTopologyChange2 fails sometimes in
> master with timeout.
> Example of such failure:
> [https://ci.ignite.apache.org/viewLog.html?buildId=1977579&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_Cache2#testNameId-613377372188362920]
> Typical log output:
>
> {noformat}
> [2018-10-03 19:40:32,654][INFO
> ][sys-#438%cache.IgniteCacheIncrementTxTest1%][GridDhtPartitionDemander]
> Started rebalance routine [default,
> supplier=67fdbd60-24fe-4810-a6a6-41a949b00003, topic=0, fullPartitions=[28,
> 31, 33, 40, 43, 56, 61, 63, 70, 86, 93, 107, 115, 129, 149, 153, 167, 187,
> 207, 215, 218, 224, 247, 279, 284, 290, 329, 332, 342, 373, 377, 383,
> 385-386, 423, 435, 469, 478, 494, 515, 525, 528, 537, 565, 603, 607, 610,
> 624, 654, 686, 707, 718, 738, 741, 746, 766, 775, 777, 797, 807, 809, 814,
> 822, 849, 856, 872, 876, 909, 911, 914, 925, 940, 943, 962, 983, 991, 1005,
> 1014], histPartitions=[]]
> [2018-10-03 19:40:32,654][INFO
> ][sys-#175%cache.IgniteCacheIncrementTxTest0%][GridDhtPartitionSupplier]
> Finished supplying rebalancing [grp=default,
> demander=688062dd-508d-4ebc-9458-a48e1ba00002, topVer=AffinityTopologyVersion
> [topVer=45, minorTopVer=0], topic=0]
> [2018-10-03 19:40:32,654][INFO
> ][sys-#185%cache.IgniteCacheIncrementTxTest3%][GridDhtPartitionSupplier]
> Finished supplying rebalancing [grp=default,
> demander=3f09a855-390b-40ce-b3e0-8b411db00001, topVer=AffinityTopologyVersion
> [topVer=45, minorTopVer=0], topic=0]
> [2018-10-03 19:40:32,654][INFO
> ][sys-#179%cache.IgniteCacheIncrementTxTest0%][GridDhtPartitionSupplier]
> Finished supplying rebalancing [grp=default,
> demander=67fdbd60-24fe-4810-a6a6-41a949b00003, topVer=AffinityTopologyVersion
> [topVer=45, minorTopVer=0], topic=0]
> [2018-10-03 19:40:32,654][INFO
> ][sys-#234%cache.IgniteCacheIncrementTxTest1%][GridDhtPartitionDemander]
> Completed rebalancing [grp=default,
> supplier=67fdbd60-24fe-4810-a6a6-41a949b00003, topVer=AffinityTopologyVersion
> [topVer=45, minorTopVer=0], progress=1/2, time=0 ms]
> [2018-10-03 19:40:32,654][INFO
> ][sys-#237%cache.IgniteCacheIncrementTxTest3%][GridDhtPartitionDemander]
> Completed (final) rebalancing [grp=default,
> supplier=4b3e5c6e-cec4-4fb6-b1b2-47fd71900000, topVer=AffinityTopologyVersion
> [topVer=45, minorTopVer=0], progress=2/2, time=0 ms]
> [2018-10-03 19:40:32,654][INFO
> ][sys-#237%cache.IgniteCacheIncrementTxTest3%][GridDhtPartitionDemander]
> Completed rebalance future: RebalanceFuture [grp=CacheGroupContext
> [grp=default], topVer=AffinityTopologyVersion [topVer=45, minorTopVer=0],
> rebalanceId=96, routines=2]
> [2018-10-03 19:40:32,655][INFO
> ][sys-#162%cache.IgniteCacheIncrementTxTest2%][GridDhtPartitionDemander]
> Completed (final) rebalancing [grp=default,
> supplier=4b3e5c6e-cec4-4fb6-b1b2-47fd71900000, topVer=AffinityTopologyVersion
> [topVer=45, minorTopVer=0], progress=2/2, time=0 ms]
> [2018-10-03 19:40:32,655][INFO
> ][sys-#162%cache.IgniteCacheIncrementTxTest2%][GridDhtPartitionDemander]
> Completed rebalance future: RebalanceFuture [grp=CacheGroupContext
> [grp=default], topVer=AffinityTopologyVersion [topVer=45, minorTopVer=0],
> rebalanceId=96, routines=2]
> [2018-10-03 19:40:33,260][INFO
> ][exchange-worker-#38%cache.IgniteCacheIncrementTxTest0%][GridCachePartitionExchangeManager]
> Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
> [topVer=45, minorTopVer=0], force=true, evt=NODE_LEFT,
> node=f675cf49-5db3-45b3-83fb-7a7788400009]
> [2018-10-03 19:40:33,261][INFO
> ][exchange-worker-#151%cache.IgniteCacheIncrementTxTest1%][GridDhtPartitionDemander]
> Cancelled rebalancing from all nodes [grp=default,
> topVer=AffinityTopologyVersion [topVer=45, minorTopVer=0]]
> [2018-10-03 19:40:33,261][INFO
> ][exchange-worker-#151%cache.IgniteCacheIncrementTxTest1%][GridDhtPartitionDemander]
> Completed rebalance future: RebalanceFuture [grp=CacheGroupContext
> [grp=default], topVer=AffinityTopologyVersion [topVer=45, minorTopVer=0],
> rebalanceId=96, routines=2]
> [2018-10-03 19:40:33,261][INFO
> ][exchange-worker-#151%cache.IgniteCacheIncrementTxTest1%][GridCachePartitionExchangeManager]
> Rebalancing scheduled [order=[default], top=AffinityTopologyVersion
> [topVer=45, minorTopVer=0], force=true, evt=NODE_LEFT,
> node=f675cf49-5db3-45b3-83fb-7a7788400009]
> [2018-10-03 19:40:33,262][INFO
> ][exchange-worker-#151%cache.IgniteCacheIncrementTxTest1%][GridDhtPartitionDemander]
> Prepared rebalancing [grp=default, mode=ASYNC,
> supplier=4b3e5c6e-cec4-4fb6-b1b2-47fd71900000, partitionsCount=97,
> topVer=AffinityTopologyVersion [topVer=45, minorTopVer=0], parallelism=1]
> [2018-10-03 19:40:33,899][INFO
> ][sys-#155%cache.IgniteCacheIncrementTxTest0%][GridCachePartitionExchangeManager]
> Full Message creating for AffinityTopologyVersion [topVer=45, minorTopVer=0]
> performed in 1 ms.
> [2018-10-03 19:40:33,903][INFO
> ][sys-#155%cache.IgniteCacheIncrementTxTest0%][GridCachePartitionExchangeManager]
> Sending Full Message for AffinityTopologyVersion [topVer=45, minorTopVer=0]
> performed in 4 ms.
> [2018-10-03 19:40:35,658][INFO
> ][sys-#179%cache.IgniteCacheIncrementTxTest0%][GridCachePartitionExchangeManager]
> Full Message creating for AffinityTopologyVersion [topVer=45, minorTopVer=0]
> performed in 0 ms.
> [2018-10-03 19:40:35,663][INFO
> ][sys-#179%cache.IgniteCacheIncrementTxTest0%][GridCachePartitionExchangeManager]
> Sending Full Message for AffinityTopologyVersion [topVer=45, minorTopVer=0]
> performed in 5 ms.
> [2018-10-03 19:43:41,700][INFO
> ][tcp-disco-sock-reader-#226%cache.IgniteCacheIncrementTxTest0%][TestTcpDiscoverySpi]
> Finished serving remote node connection [rmtAddr=/127.0.0.1:44749,
> rmtPort=44749
> [2018-10-03 19:45:25,806][ERROR][main][root] Test has been timed out and will
> be interrupted (threads dump will be taken before interruption)
> [test=testIncrementTxTopologyChange2, timeout=300000]
> {noformat}
> Test forces cache rebalancing (2 backups, full_sync) on different nodes and
> hangs after cancelling previous rebalance.
> *Update*:
> Test fails sometimes because partition cannot change state from {{RENTING}}
> to {{EVICTED}}, eviction task cannot cleanup locked cache entries ("_Entry
> could not be marked obsolete (it is still used or has readers)_"), but this
> task successfully completes and leaves partition in {{RENTING}} state.
> At the same time the transaction completes and unlocks entries, but eviction
> task is not started because in IGNITE-9244 was added deduplication check by
> partition id.
> Then the partition goes to the {{MOVING}} state and the rebalancing hangs,
> waiting for the clearFuture to complete.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)