[
https://issues.apache.org/jira/browse/IGNITE-23803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Roman Puchkovskiy resolved IGNITE-23803.
----------------------------------------
Fix Version/s: 3.0
Resolution: Fixed
Fixed by IGNITE-24112
> Node leave causes a critical failure on Catalog update application
> ------------------------------------------------------------------
>
> Key: IGNITE-23803
> URL: https://issues.apache.org/jira/browse/IGNITE-23803
> Project: Ignite
> Issue Type: Bug
> Reporter: Roman Puchkovskiy
> Priority: Major
> Labels: ignite-3
> Fix For: 3.0
>
>
> Node leave caused a failure in CatalogManagerImpl (see Log chunk 1).
> It seems that, as a reaction to some Catalog event application, a Catalog
> listener tried to send a message to another node, the recipient node left the
> topology, so the send failed with RecipientLeftException, and the failure
> popped to the CatalogManager which considers any failure of any of its
> listeners as a fatal error and notifies the FailureHandler. Even worse, as
> the same exception then reached WatchProcessor, it halted subsequent
> Metastorage events processing (see Log chunk 2).
> 'Recipient left' should not be fatal for the node as a whole. Such exceptions
> should be carefully handled in Catalog listeners, or a systemic approach is
> to be invented.
>
> Log chunk 1:
>
> 2024-11-27 15:56:57:316 +0000
> [WARNING][%ReproducerTest_cluster_2%connection-maintenance-0][CatalogManagerImpl]
> Failed to apply catalog update.
> java.util.concurrent.CompletionException:
> org.apache.ignite.internal.network.RecipientLeftException: IGN-NETWORK-5
> TraceId:61b649e8-3f0b-48b7-9bc1-e2f5a9060cbb
> at
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
> at
> java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
> at
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:632)
> at
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
> at
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
> at
> org.apache.ignite.internal.raft.RaftGroupServiceImpl.handleThrowable(RaftGroupServiceImpl.java:641)
> at
> org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$sendWithRetry$49(RaftGroupServiceImpl.java:618)
> at
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
> at
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
> at
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
> at
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
> at
> org.apache.ignite.internal.network.OutNetworkObject.failAcknowledgement(OutNetworkObject.java:95)
> at
> org.apache.ignite.internal.network.recovery.RecoveryDescriptor.dispose(RecoveryDescriptor.java:265)
> at
> org.apache.ignite.internal.network.netty.ConnectionManager.blockAndDisposeDescriptor(ConnectionManager.java:679)
> at
> org.apache.ignite.internal.network.netty.ConnectionManager.disposeRecoveryDescriptorsOfLeftNode(ConnectionManager.java:647)
> at
> org.apache.ignite.internal.network.netty.ConnectionManager.lambda$handleNodeLeft$9(ConnectionManager.java:623)
> at
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
> at
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
> at
> java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: org.apache.ignite.internal.network.RecipientLeftException:
> IGN-NETWORK-5 TraceId:61b649e8-3f0b-48b7-9bc1-e2f5a9060cbb
> at
> org.apache.ignite.internal.network.netty.ConnectionManager.disposeRecoveryDescriptorsOfLeftNode(ConnectionManager.java:644)
> ... 7 more
> 2024-11-27 15:56:57:317 +0000
> [INFO][%ReproducerTest_cluster_2%connection-maintenance-0][WatchProcessor]
> Notification chain encountered an error, so no notifications will be ever
> fired for subsequent revisions until a restart. Notifying the FailureManager
> 2024-11-27 15:56:57:318 +0000
> [ERROR][%ReproducerTest_cluster_2%connection-maintenance-0][FailureManager]
> Critical system error detected. Will be handled accordingly to configured
> handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=CRITICAL_ERROR]
> org.apache.ignite.internal.network.RecipientLeftException: IGN-NETWORK-5
> TraceId:61b649e8-3f0b-48b7-9bc1-e2f5a9060cbb
> at
> org.apache.ignite.internal.network.netty.ConnectionManager.disposeRecoveryDescriptorsOfLeftNode(ConnectionManager.java:644)
> at
> org.apache.ignite.internal.network.netty.ConnectionManager.lambda$handleNodeLeft$9(ConnectionManager.java:623)
> at
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
> at
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
> at
> java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> 2024-11-27 15:56:57:318 +0000
> [WARNING][%ReproducerTest_cluster_2%connection-maintenance-0][UpdateLogImpl]
> Unable to process catalog event
> org.apache.ignite.internal.network.RecipientLeftException: IGN-NETWORK-5
> TraceId:61b649e8-3f0b-48b7-9bc1-e2f5a9060cbb
> at
> org.apache.ignite.internal.network.netty.ConnectionManager.disposeRecoveryDescriptorsOfLeftNode(ConnectionManager.java:644)
> at
> org.apache.ignite.internal.network.netty.ConnectionManager.lambda$handleNodeLeft$9(ConnectionManager.java:623)
> at
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
> at
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
> at
> java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
>
> Log chunk 2:
>
> 2024-11-27 15:56:57:317 +0000
> [INFO][%Gg41011ReproducerTest_cluster_2%connection-maintenance-0][WatchProcessor]
> Notification chain encountered an error, so no notifications will be ever
> fired for subsequent revisions until a restart. Notifying the FailureManager
> 2024-11-27 15:56:57:318 +0000
> [ERROR][%Gg41011ReproducerTest_cluster_2%connection-maintenance-0][FailureManager]
> Critical system error detected. Will be handled accordingly to configured
> handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=CRITICAL_ERROR]
> org.apache.ignite.internal.network.RecipientLeftException: IGN-NETWORK-5
> TraceId:61b649e8-3f0b-48b7-9bc1-e2f5a9060cbb
> at
> org.apache.ignite.internal.network.netty.ConnectionManager.disposeRecoveryDescriptorsOfLeftNode(ConnectionManager.java:644)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)