[
https://issues.apache.org/jira/browse/IGNITE-8728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743752#comment-16743752
]
Stanislav Lukyanov commented on IGNITE-8728:
--------------------------------------------
Looked at the logs.
What happened is that the first node (the one that was online) tried to iterate
over its own indexes when the second node joined to validate the second node's
indexes. And at that time the first node detected that there is a repeating
index name in its indexes - and halted because that's a totally unexpected
situation.
There are two issues here
1. Why were there two indexes of the same name? We need to figure out what are
the steps leading to that and cover that hole.
2. The handling of the duplicate indexes is very unfortunate. The node
shouldn't halt - even though we have a duplicate index name, we can try to
continue to work (and probably successful) instead of just crashing. We also
don't print any useful debug info.
> Baselined node rejoining crashes other baseline nodes - Duplicate Key Error
> ---------------------------------------------------------------------------
>
> Key: IGNITE-8728
> URL: https://issues.apache.org/jira/browse/IGNITE-8728
> Project: Ignite
> Issue Type: Bug
> Affects Versions: 2.7
> Reporter: Mahesh Renduchintala
> Priority: Critical
> Attachments: NS1_ignite-9676df15.0.log, NS2_ignite-7cfc8008.0.log,
> node-config.xml
>
>
> I have two nodes on which we have 3 tables which are partitioned. Index are
> also built on these tables.
> For 24 hours caches work fine. The tables are definitely distributed across
> both the nodes
> Node 2 reboots due to some issue - goes out of the baseline - comes back and
> joins the baseline. Other baseline nodes crash and in the logs we see
> duplicate Key error
> [10:38:35,437][INFO][tcp-disco-srvr-#2|#2][TcpDiscoverySpi] TCP discovery
> accepted incoming connection [rmtAddr=/192.168.1.7, rmtPort=45102]
> [10:38:35,437][INFO][tcp-disco-srvr-#2|#2][TcpDiscoverySpi] TCP discovery
> spawning a new thread for connection [rmtAddr=/192.168.1.7, rmtPort=45102]
> [10:38:35,437][INFO][tcp-disco-sock-reader-#12|#12][TcpDiscoverySpi] Started
> serving remote node connection [rmtAddr=/192.168.1.7:45102, rmtPort=45102]
> [10:38:35,451][INFO][tcp-disco-sock-reader-#12|#12][TcpDiscoverySpi]
> Finished serving remote node connection [rmtAddr=/192.168.1.7:45102,
> rmtPort=45102
> [10:38:35,457][SEVERE][tcp-disco-msg-worker-#3|#3][TcpDiscoverySpi]
> TcpDiscoverSpi's message worker thread failed abnormally. Stopping the node
> in order to prevent cluster wide instability.
> *java.lang.IllegalStateException: Duplicate key*
> at org.apache.ignite.cache.QueryEntity.checkIndexes(QueryEntity.java:223)
> at org.apache.ignite.cache.QueryEntity.makePatch(QueryEntity.java:174)
> at
> org.apache.ignite.internal.processors.query.QuerySchema.makePatch(QuerySchema.java:114)
> at
> org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor.makeSchemaPatch(DynamicCacheDescriptor.java:360)
> at
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.validateNode(GridCacheProcessor.java:2536)
> at
> org.apache.ignite.internal.managers.GridManagerAdapter$1.validateNode(GridManagerAdapter.java:566)
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processJoinRequestMessage(ServerImpl.java:3629)
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2736)
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2536)
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6775)
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2621)
> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> [10:38:35,459][SEVERE][tcp-disco-msg-worker-#3|#3][] Critical system error
> detected. Will be handled accordingly to configured handler [hnd=class
> o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext
> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException:
> Duplicate key]]
> java.lang.IllegalStateException: Duplicate key
> at org.apache.ignite.cache.QueryEntity.checkIndexes(QueryEntity.java:223)
> at org.apache.ignite.cache.QueryEntity.makePatch(QueryEntity.java:174)
> at
> org.apache.ignite.internal.processors.query.QuerySchema.makePatch(QuerySchema.java:114)
> at
> org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor.makeSchemaPatch(DynamicCacheDescriptor.java:360)
> at
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.validateNode(GridCacheProcessor.java:2536)
> at
> org.apache.ignite.internal.managers.GridManagerAdapter$1.validateNode(GridManagerAdapter.java:566)
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processJoinRequestMessage(ServerImpl.java:3629)
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2736)
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2536)
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6775)
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2621)
> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> [10:38:35,460][SEVERE][tcp-disco-msg-worker-#3|#3][] JVM will be halted
> immediately due to the failure: [failureCtx=FailureContext
> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException:
> Duplicate key]]
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)