[
https://issues.apache.org/jira/browse/IGNITE-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vladislav Pyatkov reassigned IGNITE-14138:
------------------------------------------
Assignee: Vladislav Pyatkov
> Historical rebalance kills cluster
> ----------------------------------
>
> Key: IGNITE-14138
> URL: https://issues.apache.org/jira/browse/IGNITE-14138
> Project: Ignite
> Issue Type: Bug
> Reporter: Vladislav Pyatkov
> Assignee: Vladislav Pyatkov
> Priority: Major
>
> {noformat}
> [2021-01-12T05:11:02,142][ERROR][rebalance-#508%---%][] Critical system error
> detected. Will be handled accordingly to configured handler
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
> failureCtx=FailureContext [type=CRITICAL_ERROR, err=class
> o.a.i.IgniteCheckedException: Failed to continue supplying
> [grp=SQL_USAGES_EPE, demander=48254935-7aa9-4ab5-b398-fdaec334fab7,
> topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1]]]]
> org.apache.ignite.IgniteCheckedException: Failed to continue supplying
> [grp=SQL_1, demander=48254935-7aa9-4ab5-b398-fdaec334fab7,
> topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1]]
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplier.handleDemandMessage(GridDhtPartitionSupplier.java:571)
> [ignite-core.jar]
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.handleDemandMessage(GridDhtPreloader.java:398)
> [ignite-core.jar]
> at
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:489)
> [ignite-core.jar]
> at
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:474)
> [ignite-core.jar]
> at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142)
> [ignite-core.jar]
> at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
> [ignite-core.jar]
> at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$800(GridCacheIoManager.java:109)
> [ignite-core.jar]
> at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$OrderedMessageListener.onMessage(GridCacheIoManager.java:1707)
> [ignite-core.jar]
> at
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1721)
> [ignite-core.jar]
> at
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4300(GridIoManager.java:157)
> [ignite-core.jar]
> at
> org.apache.ignite.internal.managers.communication.GridIoManager$GridCommunicationMessageSet.unwind(GridIoManager.java:3011)
> [ignite-core.jar]
> at
> org.apache.ignite.internal.managers.communication.GridIoManager.unwindMessageSet(GridIoManager.java:1662)
> [ignite-core.jar]
> at
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4900(GridIoManager.java:157)
> [ignite-core.jar]
> at
> org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1629)
> [ignite-core.jar]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> [?:?]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> [?:?]
> at java.lang.Thread.run(Thread.java:834) [?:?]
> Caused by: org.apache.ignite.IgniteCheckedException: Could not find start
> pointer for partition [part=4, partCntrSince=1115]
> at
> org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointHistory.searchEarliestWalPointer(CheckpointHistory.java:557)
> ~[ignite-core.jar]
> at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.historicalIterator(GridCacheOffheapManager.java:1121)
> ~[ignite-core.jar]
> at
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.rebalanceIterator(IgniteCacheOffheapManagerImpl.java:1195)
> ~[ignite-core.jar]
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplier.handleDemandMessage(GridDhtPartitionSupplier.java:322)
> ~[ignite-core.jar]
> ... 16 more
> {noformat}
> I believe that it should throw IgniteHistoricalIteratorException instead of
> IgniteCheckedException, so it can be properly handled and rebalance can move
> to the full rebalance instead of killing nodes
--
This message was sent by Atlassian Jira
(v8.3.4#803005)