Aleksey Plekhanov created IGNITE-13747:
------------------------------------------

             Summary: Coordinator failure after node left: Unexpected rebalance 
on rebalanced cluster
                 Key: IGNITE-13747
                 URL: https://issues.apache.org/jira/browse/IGNITE-13747
             Project: Ignite
          Issue Type: Bug
            Reporter: Aleksey Plekhanov


Exchange worker terminated on a coordinator after node left in some cases with 
stack trace:

{noformat}
java.lang.AssertionError: Unexpected rebalance on rebalanced cluster: 
assignments=GridDhtPreloaderAssignments [exchangeId=GridDhtPartitionExchangeId 
[topVer=AffinityTopologyVersion [topVer=67, minorTopVer=0], 
discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode 
[id=0f61f2f6-6ffb-4772-a6a6-d2f411600002, consistentId=127.0.0.1:47502, 
addrs=ArrayList [127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47502], 
discPort=47502, order=66, intOrder=35, lastExchangeTime=1606212165542, 
loc=false, ver=2.10.0#20201124-sha1:00000000, isClient=false], topVer=67, 
msgTemplate=null, 
span=org.apache.ignite.internal.processors.tracing.NoopSpan@73893b7d, 
nodeId8=30b8d2de, msg=Node left, type=NODE_LEFT, tstamp=1606212166204], 
nodeId=0f61f2f6, evt=NODE_LEFT], topVer=AffinityTopologyVersion [topVer=67, 
minorTopVer=0], cancelled=false, affinityReassign=false, 
super={TcpDiscoveryNode [id=cfa40f59-ed19-4d5e-9d62-55f44a100001, 
consistentId=127.0.0.1:47501, addrs=ArrayList [127.0.0.1], sockAddrs=HashSet 
[/127.0.0.1:47501], discPort=47501, order=2, intOrder=2, 
lastExchangeTime=1606212120855, loc=false, ver=2.10.0#20201124-sha1:00000000, 
isClient=false]=GridDhtPartitionDemandMessage [rebalanceId=506, 
parts=IgniteDhtDemandedPartitionsMap [historical=null, full=HashSet [8]], 
timeout=0, workerId=-1, topVer=AffinityTopologyVersion [topVer=67, 
minorTopVer=0], partCnt=1, super=GridCacheGroupIdMessage [grpId=94416770]]}], 
locPart=[topVer=AffinityTopologyVersion [topVer=67, minorTopVer=0], 
lastChangeTopVer=AffinityTopologyVersion [topVer=67, minorTopVer=0], 
waitRebalance=false, nodes=[cfa40f59-ed19-4d5e-9d62-55f44a100001, 
30b8d2de-d610-4dd4-aff2-fe4098b00000], locPart=GridDhtLocalPartition 
[rmvQueueMaxSize=1024, rmvdEntryTtl=10000, id=8, delayedRenting=true, 
finishFutRef=null, clearVer=1606261964138, grp=cache, state=MOVING, 
reservations=0, empty=false, createTime=11/24/2020 13:02:45, fullSize=68, 
cntr=Counter [init=0, val=2003]], ver4=AffinityTopologyVersion [topVer=67, 
minorTopVer=0], affOwners4=[cfa40f59-ed19-4d5e-9d62-55f44a100001, 
30b8d2de-d610-4dd4-aff2-fe4098b00000], ver3=AffinityTopologyVersion [topVer=66, 
minorTopVer=0], affOwners3=[cfa40f59-ed19-4d5e-9d62-55f44a100001, 
0f61f2f6-6ffb-4772-a6a6-d2f411600002], ver2=AffinityTopologyVersion [topVer=65, 
minorTopVer=0], affOwners2=[cfa40f59-ed19-4d5e-9d62-55f44a100001, 
30b8d2de-d610-4dd4-aff2-fe4098b00000], ver1=AffinityTopologyVersion [topVer=64, 
minorTopVer=0], affOwners1=[cfa40f59-ed19-4d5e-9d62-55f44a100001, 
259eb107-e17a-4a8f-9ad8-4653f8500002], ver0=AffinityTopologyVersion [topVer=63, 
minorTopVer=0], affOwners0=[cfa40f59-ed19-4d5e-9d62-55f44a100001, 
30b8d2de-d610-4dd4-aff2-fe4098b00000]]
  at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.generateAssignments(GridDhtPreloader.java:302)
  at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3483)
  at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3184)
{noformat}

Reproducer:

{code:java}
    @Override protected IgniteConfiguration getConfiguration(String 
igniteInstanceName) throws Exception {
        return super.getConfiguration(igniteInstanceName)
            .setCacheConfiguration(new CacheConfiguration<>("cache")
                .setBackups(1)
                .setEvictionPolicyFactory(() -> new 
LruEvictionPolicy<>().setMaxSize(100))
                .setOnheapCacheEnabled(true)
                .setNearConfiguration(new NearCacheConfiguration<>())
                .setAffinity(new RendezvousAffinityFunction(false, 10))
            );
    }

    @Test
    public void test() throws Exception {
        startGrids(4);

        try {
            AtomicInteger gridIdx = new AtomicInteger();

            long ts = U.currentTimeMillis();

            GridTestUtils.runMultiThreadedAsync(() -> {
                IgniteCache<Integer, Integer> cache = 
grid(gridIdx.getAndIncrement()).cache("cache");

                while (U.currentTimeMillis() - ts < 150_000L)
                    cache.put(ThreadLocalRandom.current().nextInt(100_000), 0);
            }, 2, "put-worker");

            while (U.currentTimeMillis() - ts < 150_000L) {
                stopGrid(2);
                startGrid(2);
            }
        }
        finally {
            stopAllGrids();
        }
    }
{code}

Also test 
GridCachePartitionedOptimisticTxNodeRestartTest#testRestartWithTxFourNodesOneBackupsOffheapEvict
 flaky on Team-City for this reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to