[jira] [Commented] (IGNITE-8728) Baselined node rejoining crashes other baseline nodes - Duplicate Key Error

Stanislav Lukyanov (JIRA) Wed, 16 Jan 2019 00:39:34 -0800


    [ 
https://issues.apache.org/jira/browse/IGNITE-8728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743752#comment-16743752
 ]


Stanislav Lukyanov commented on IGNITE-8728:
--------------------------------------------

Looked at the logs.

What happened is that the first node (the one that was online) tried to iterate 
over its own indexes when the second node joined to validate the second node's 
indexes. And at that time the first node detected that there is a repeating 
index name in its indexes - and halted because that's a totally unexpected 
situation.

There are two issues here
1. Why were there two indexes of the same name? We need to figure out what are 
the steps leading to that and cover that hole.
2. The handling of the duplicate indexes is very unfortunate. The node 
shouldn't halt - even though we have a duplicate index name, we can try to 
continue to work (and probably successful) instead of just crashing. We also 
don't print any useful debug info.

> Baselined node rejoining crashes other baseline nodes - Duplicate Key Error
> ---------------------------------------------------------------------------
>
>                 Key: IGNITE-8728
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8728
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.7
>            Reporter: Mahesh Renduchintala
>            Priority: Critical
>         Attachments: NS1_ignite-9676df15.0.log, NS2_ignite-7cfc8008.0.log, 
> node-config.xml
>
>
> I have two nodes on which we have 3 tables which are partitioned.  Index are 
> also built on these tables. 
> For 24 hours caches work fine.  The tables are definitely distributed across 
> both the nodes
> Node 2 reboots due to some issue - goes out of the baseline - comes back and 
> joins the baseline.  Other baseline nodes crash and in the logs we see 
> duplicate Key error
> [10:38:35,437][INFO][tcp-disco-srvr-#2|#2][TcpDiscoverySpi] TCP discovery 
> accepted incoming connection [rmtAddr=/192.168.1.7, rmtPort=45102]
>  [10:38:35,437][INFO][tcp-disco-srvr-#2|#2][TcpDiscoverySpi] TCP discovery 
> spawning a new thread for connection [rmtAddr=/192.168.1.7, rmtPort=45102]
>  [10:38:35,437][INFO][tcp-disco-sock-reader-#12|#12][TcpDiscoverySpi] Started 
> serving remote node connection [rmtAddr=/192.168.1.7:45102, rmtPort=45102]
>  [10:38:35,451][INFO][tcp-disco-sock-reader-#12|#12][TcpDiscoverySpi] 
> Finished serving remote node connection [rmtAddr=/192.168.1.7:45102, 
> rmtPort=45102
>  [10:38:35,457][SEVERE][tcp-disco-msg-worker-#3|#3][TcpDiscoverySpi] 
> TcpDiscoverSpi's message worker thread failed abnormally. Stopping the node 
> in order to prevent cluster wide instability.
>  *java.lang.IllegalStateException: Duplicate key*
> at org.apache.ignite.cache.QueryEntity.checkIndexes(QueryEntity.java:223)
> at org.apache.ignite.cache.QueryEntity.makePatch(QueryEntity.java:174)
> at 
> org.apache.ignite.internal.processors.query.QuerySchema.makePatch(QuerySchema.java:114)
> at 
> org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor.makeSchemaPatch(DynamicCacheDescriptor.java:360)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.validateNode(GridCacheProcessor.java:2536)
> at 
> org.apache.ignite.internal.managers.GridManagerAdapter$1.validateNode(GridManagerAdapter.java:566)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processJoinRequestMessage(ServerImpl.java:3629)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2736)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2536)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6775)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2621)
> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
>  [10:38:35,459][SEVERE][tcp-disco-msg-worker-#3|#3][] Critical system error 
> detected. Will be handled accordingly to configured handler [hnd=class 
> o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext 
> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: 
> Duplicate key]]
> java.lang.IllegalStateException: Duplicate key
> at org.apache.ignite.cache.QueryEntity.checkIndexes(QueryEntity.java:223)
> at org.apache.ignite.cache.QueryEntity.makePatch(QueryEntity.java:174)
> at 
> org.apache.ignite.internal.processors.query.QuerySchema.makePatch(QuerySchema.java:114)
> at 
> org.apache.ignite.internal.processors.cache.DynamicCacheDescriptor.makeSchemaPatch(DynamicCacheDescriptor.java:360)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.validateNode(GridCacheProcessor.java:2536)
> at 
> org.apache.ignite.internal.managers.GridManagerAdapter$1.validateNode(GridManagerAdapter.java:566)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processJoinRequestMessage(ServerImpl.java:3629)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2736)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2536)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6775)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2621)
> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
>  [10:38:35,460][SEVERE][tcp-disco-msg-worker-#3|#3][] JVM will be halted 
> immediately due to the failure: [failureCtx=FailureContext 
> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: 
> Duplicate key]]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (IGNITE-8728) Baselined node rejoining crashes other baseline nodes - Duplicate Key Error

Reply via email to