[ 
https://issues.apache.org/jira/browse/IGNITE-23803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Puchkovskiy resolved IGNITE-23803.
----------------------------------------
    Fix Version/s: 3.0
       Resolution: Fixed

Fixed by IGNITE-24112

> Node leave causes a critical failure on Catalog update application
> ------------------------------------------------------------------
>
>                 Key: IGNITE-23803
>                 URL: https://issues.apache.org/jira/browse/IGNITE-23803
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Roman Puchkovskiy
>            Priority: Major
>              Labels: ignite-3
>             Fix For: 3.0
>
>
> Node leave caused a failure in CatalogManagerImpl (see Log chunk 1). 
> It seems that, as a reaction to some Catalog event application, a Catalog 
> listener tried to send a message to another node, the recipient node left the 
> topology, so the send failed with RecipientLeftException, and the failure 
> popped to the CatalogManager which considers any failure of any of its 
> listeners as a fatal error and notifies the FailureHandler. Even worse, as 
> the same exception then reached WatchProcessor, it halted subsequent 
> Metastorage events processing (see Log chunk 2).
> 'Recipient left' should not be fatal for the node as a whole. Such exceptions 
> should be carefully handled in Catalog listeners, or a systemic approach is 
> to be invented.
>  
> Log chunk 1:
>  
> 2024-11-27 15:56:57:316 +0000 
> [WARNING][%ReproducerTest_cluster_2%connection-maintenance-0][CatalogManagerImpl]
>  Failed to apply catalog update.
> java.util.concurrent.CompletionException: 
> org.apache.ignite.internal.network.RecipientLeftException: IGN-NETWORK-5 
> TraceId:61b649e8-3f0b-48b7-9bc1-e2f5a9060cbb
>     at 
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
>     at 
> java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
>     at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:632)
>     at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>     at 
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
>     at 
> org.apache.ignite.internal.raft.RaftGroupServiceImpl.handleThrowable(RaftGroupServiceImpl.java:641)
>     at 
> org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$sendWithRetry$49(RaftGroupServiceImpl.java:618)
>     at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
>     at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
>     at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>     at 
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
>     at 
> org.apache.ignite.internal.network.OutNetworkObject.failAcknowledgement(OutNetworkObject.java:95)
>     at 
> org.apache.ignite.internal.network.recovery.RecoveryDescriptor.dispose(RecoveryDescriptor.java:265)
>     at 
> org.apache.ignite.internal.network.netty.ConnectionManager.blockAndDisposeDescriptor(ConnectionManager.java:679)
>     at 
> org.apache.ignite.internal.network.netty.ConnectionManager.disposeRecoveryDescriptorsOfLeftNode(ConnectionManager.java:647)
>     at 
> org.apache.ignite.internal.network.netty.ConnectionManager.lambda$handleNodeLeft$9(ConnectionManager.java:623)
>     at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
>     at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
>     at 
> java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: org.apache.ignite.internal.network.RecipientLeftException: 
> IGN-NETWORK-5 TraceId:61b649e8-3f0b-48b7-9bc1-e2f5a9060cbb
>     at 
> org.apache.ignite.internal.network.netty.ConnectionManager.disposeRecoveryDescriptorsOfLeftNode(ConnectionManager.java:644)
>     ... 7 more
> 2024-11-27 15:56:57:317 +0000 
> [INFO][%ReproducerTest_cluster_2%connection-maintenance-0][WatchProcessor] 
> Notification chain encountered an error, so no notifications will be ever 
> fired for subsequent revisions until a restart. Notifying the FailureManager
> 2024-11-27 15:56:57:318 +0000 
> [ERROR][%ReproducerTest_cluster_2%connection-maintenance-0][FailureManager] 
> Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler 
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=CRITICAL_ERROR]
> org.apache.ignite.internal.network.RecipientLeftException: IGN-NETWORK-5 
> TraceId:61b649e8-3f0b-48b7-9bc1-e2f5a9060cbb
>     at 
> org.apache.ignite.internal.network.netty.ConnectionManager.disposeRecoveryDescriptorsOfLeftNode(ConnectionManager.java:644)
>     at 
> org.apache.ignite.internal.network.netty.ConnectionManager.lambda$handleNodeLeft$9(ConnectionManager.java:623)
>     at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
>     at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
>     at 
> java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at java.base/java.lang.Thread.run(Thread.java:834)
> 2024-11-27 15:56:57:318 +0000 
> [WARNING][%ReproducerTest_cluster_2%connection-maintenance-0][UpdateLogImpl] 
> Unable to process catalog event
> org.apache.ignite.internal.network.RecipientLeftException: IGN-NETWORK-5 
> TraceId:61b649e8-3f0b-48b7-9bc1-e2f5a9060cbb
>     at 
> org.apache.ignite.internal.network.netty.ConnectionManager.disposeRecoveryDescriptorsOfLeftNode(ConnectionManager.java:644)
>     at 
> org.apache.ignite.internal.network.netty.ConnectionManager.lambda$handleNodeLeft$9(ConnectionManager.java:623)
>     at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
>     at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
>     at 
> java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at java.base/java.lang.Thread.run(Thread.java:834)
>  
> Log chunk 2:
>  
> 2024-11-27 15:56:57:317 +0000 
> [INFO][%Gg41011ReproducerTest_cluster_2%connection-maintenance-0][WatchProcessor]
>  Notification chain encountered an error, so no notifications will be ever 
> fired for subsequent revisions until a restart. Notifying the FailureManager
> 2024-11-27 15:56:57:318 +0000 
> [ERROR][%Gg41011ReproducerTest_cluster_2%connection-maintenance-0][FailureManager]
>  Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler 
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=CRITICAL_ERROR]
> org.apache.ignite.internal.network.RecipientLeftException: IGN-NETWORK-5 
> TraceId:61b649e8-3f0b-48b7-9bc1-e2f5a9060cbb
>     at 
> org.apache.ignite.internal.network.netty.ConnectionManager.disposeRecoveryDescriptorsOfLeftNode(ConnectionManager.java:644)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to