[ https://issues.apache.org/jira/browse/IGNITE-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281591#comment-17281591 ]
Vladislav Pyatkov commented on IGNITE-14138: -------------------------------------------- [~sk0x50] Please review my changes. > Historical rebalance kills cluster > ---------------------------------- > > Key: IGNITE-14138 > URL: https://issues.apache.org/jira/browse/IGNITE-14138 > Project: Ignite > Issue Type: Bug > Reporter: Vladislav Pyatkov > Assignee: Vladislav Pyatkov > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > {noformat} > [2021-01-12T05:11:02,142][ERROR][rebalance-#508%---%][] Critical system error > detected. Will be handled accordingly to configured handler > [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, > super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet > [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], > failureCtx=FailureContext [type=CRITICAL_ERROR, err=class > o.a.i.IgniteCheckedException: Failed to continue supplying > [grp=SQL_USAGES_EPE, demander=48254935-7aa9-4ab5-b398-fdaec334fab7, > topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1]]]] > org.apache.ignite.IgniteCheckedException: Failed to continue supplying > [grp=SQL_1, demander=48254935-7aa9-4ab5-b398-fdaec334fab7, > topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1]] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplier.handleDemandMessage(GridDhtPartitionSupplier.java:571) > [ignite-core.jar] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.handleDemandMessage(GridDhtPreloader.java:398) > [ignite-core.jar] > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:489) > [ignite-core.jar] > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:474) > [ignite-core.jar] > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142) > [ignite-core.jar] > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591) > [ignite-core.jar] > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$800(GridCacheIoManager.java:109) > [ignite-core.jar] > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager$OrderedMessageListener.onMessage(GridCacheIoManager.java:1707) > [ignite-core.jar] > at > org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1721) > [ignite-core.jar] > at > org.apache.ignite.internal.managers.communication.GridIoManager.access$4300(GridIoManager.java:157) > [ignite-core.jar] > at > org.apache.ignite.internal.managers.communication.GridIoManager$GridCommunicationMessageSet.unwind(GridIoManager.java:3011) > [ignite-core.jar] > at > org.apache.ignite.internal.managers.communication.GridIoManager.unwindMessageSet(GridIoManager.java:1662) > [ignite-core.jar] > at > org.apache.ignite.internal.managers.communication.GridIoManager.access$4900(GridIoManager.java:157) > [ignite-core.jar] > at > org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1629) > [ignite-core.jar] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > [?:?] > at java.lang.Thread.run(Thread.java:834) [?:?] > Caused by: org.apache.ignite.IgniteCheckedException: Could not find start > pointer for partition [part=4, partCntrSince=1115] > at > org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointHistory.searchEarliestWalPointer(CheckpointHistory.java:557) > ~[ignite-core.jar] > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.historicalIterator(GridCacheOffheapManager.java:1121) > ~[ignite-core.jar] > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.rebalanceIterator(IgniteCacheOffheapManagerImpl.java:1195) > ~[ignite-core.jar] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplier.handleDemandMessage(GridDhtPartitionSupplier.java:322) > ~[ignite-core.jar] > ... 16 more > {noformat} > I believe that it should throw IgniteHistoricalIteratorException instead of > IgniteCheckedException, so it can be properly handled and rebalance can move > to the full rebalance instead of killing nodes -- This message was sent by Atlassian Jira (v8.3.4#803005)