Aleksey Plekhanov created IGNITE-13747:
------------------------------------------
Summary: Coordinator failure after node left: Unexpected rebalance
on rebalanced cluster
Key: IGNITE-13747
URL: https://issues.apache.org/jira/browse/IGNITE-13747
Project: Ignite
Issue Type: Bug
Reporter: Aleksey Plekhanov
Exchange worker terminated on a coordinator after node left in some cases with
stack trace:
{noformat}
java.lang.AssertionError: Unexpected rebalance on rebalanced cluster:
assignments=GridDhtPreloaderAssignments [exchangeId=GridDhtPartitionExchangeId
[topVer=AffinityTopologyVersion [topVer=67, minorTopVer=0],
discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode
[id=0f61f2f6-6ffb-4772-a6a6-d2f411600002, consistentId=127.0.0.1:47502,
addrs=ArrayList [127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47502],
discPort=47502, order=66, intOrder=35, lastExchangeTime=1606212165542,
loc=false, ver=2.10.0#20201124-sha1:00000000, isClient=false], topVer=67,
msgTemplate=null,
span=org.apache.ignite.internal.processors.tracing.NoopSpan@73893b7d,
nodeId8=30b8d2de, msg=Node left, type=NODE_LEFT, tstamp=1606212166204],
nodeId=0f61f2f6, evt=NODE_LEFT], topVer=AffinityTopologyVersion [topVer=67,
minorTopVer=0], cancelled=false, affinityReassign=false,
super={TcpDiscoveryNode [id=cfa40f59-ed19-4d5e-9d62-55f44a100001,
consistentId=127.0.0.1:47501, addrs=ArrayList [127.0.0.1], sockAddrs=HashSet
[/127.0.0.1:47501], discPort=47501, order=2, intOrder=2,
lastExchangeTime=1606212120855, loc=false, ver=2.10.0#20201124-sha1:00000000,
isClient=false]=GridDhtPartitionDemandMessage [rebalanceId=506,
parts=IgniteDhtDemandedPartitionsMap [historical=null, full=HashSet [8]],
timeout=0, workerId=-1, topVer=AffinityTopologyVersion [topVer=67,
minorTopVer=0], partCnt=1, super=GridCacheGroupIdMessage [grpId=94416770]]}],
locPart=[topVer=AffinityTopologyVersion [topVer=67, minorTopVer=0],
lastChangeTopVer=AffinityTopologyVersion [topVer=67, minorTopVer=0],
waitRebalance=false, nodes=[cfa40f59-ed19-4d5e-9d62-55f44a100001,
30b8d2de-d610-4dd4-aff2-fe4098b00000], locPart=GridDhtLocalPartition
[rmvQueueMaxSize=1024, rmvdEntryTtl=10000, id=8, delayedRenting=true,
finishFutRef=null, clearVer=1606261964138, grp=cache, state=MOVING,
reservations=0, empty=false, createTime=11/24/2020 13:02:45, fullSize=68,
cntr=Counter [init=0, val=2003]], ver4=AffinityTopologyVersion [topVer=67,
minorTopVer=0], affOwners4=[cfa40f59-ed19-4d5e-9d62-55f44a100001,
30b8d2de-d610-4dd4-aff2-fe4098b00000], ver3=AffinityTopologyVersion [topVer=66,
minorTopVer=0], affOwners3=[cfa40f59-ed19-4d5e-9d62-55f44a100001,
0f61f2f6-6ffb-4772-a6a6-d2f411600002], ver2=AffinityTopologyVersion [topVer=65,
minorTopVer=0], affOwners2=[cfa40f59-ed19-4d5e-9d62-55f44a100001,
30b8d2de-d610-4dd4-aff2-fe4098b00000], ver1=AffinityTopologyVersion [topVer=64,
minorTopVer=0], affOwners1=[cfa40f59-ed19-4d5e-9d62-55f44a100001,
259eb107-e17a-4a8f-9ad8-4653f8500002], ver0=AffinityTopologyVersion [topVer=63,
minorTopVer=0], affOwners0=[cfa40f59-ed19-4d5e-9d62-55f44a100001,
30b8d2de-d610-4dd4-aff2-fe4098b00000]]
at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.generateAssignments(GridDhtPreloader.java:302)
at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3483)
at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3184)
{noformat}
Reproducer:
{code:java}
@Override protected IgniteConfiguration getConfiguration(String
igniteInstanceName) throws Exception {
return super.getConfiguration(igniteInstanceName)
.setCacheConfiguration(new CacheConfiguration<>("cache")
.setBackups(1)
.setEvictionPolicyFactory(() -> new
LruEvictionPolicy<>().setMaxSize(100))
.setOnheapCacheEnabled(true)
.setNearConfiguration(new NearCacheConfiguration<>())
.setAffinity(new RendezvousAffinityFunction(false, 10))
);
}
@Test
public void test() throws Exception {
startGrids(4);
try {
AtomicInteger gridIdx = new AtomicInteger();
long ts = U.currentTimeMillis();
GridTestUtils.runMultiThreadedAsync(() -> {
IgniteCache<Integer, Integer> cache =
grid(gridIdx.getAndIncrement()).cache("cache");
while (U.currentTimeMillis() - ts < 150_000L)
cache.put(ThreadLocalRandom.current().nextInt(100_000), 0);
}, 2, "put-worker");
while (U.currentTimeMillis() - ts < 150_000L) {
stopGrid(2);
startGrid(2);
}
}
finally {
stopAllGrids();
}
}
{code}
Also test
GridCachePartitionedOptimisticTxNodeRestartTest#testRestartWithTxFourNodesOneBackupsOffheapEvict
flaky on Team-City for this reason.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)