[
https://issues.apache.org/jira/browse/IGNITE-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070359#comment-16070359
]
Sergey Chugunov edited comment on IGNITE-5401 at 7/3/17 9:46 AM:
-----------------------------------------------------------------
This hang was caused by a very specific scenario that may happen in multinode
cluster setup.
Source of this scenario is that marshaller mappings can be added to local map
on each node in two different ways: they may be read from file system (from
*%IGNITE_HOME%/marshaller* directory) or may be created during mapping exchange
process (which involves exchanging proposed/accepted Custom Discovery Messages).
Loading from file system is *local* operation; when some node reads mapping it
doesn't notify other nodes about this fact. And it creates mapping in
*accepted* state right away.
At the same time exchange protocol has an optimization to mark proposed
messages as duplicates to reduce CDM traffic in the ring.
So what happened in the test and what made it hanging is that some node loaded
mapping from disk, when another one requested adding the same mapping via
exchange protocol. Proposed message of the second node was marked as duplicated
and skipped, no accept had been sent. So the second node was waiting for
accepted message forever.
was (Author: sergey-chugunov):
This hang was caused by a very specific scenario that may happen in multinode
cluster setup.
Source of this scenario is that marshaller mappings can be added to local map
on each node in two different ways: they may be read from file system (from
*%IGNITE_HOME%/marshaller* directory) or may be created during mapping exchange
process (which involves exchanging proposed/accepted Custom Discovery Messages).
Loading from file system is *local* operation; when some node reads mapping it
doesn't notify other nodes about this fact. And it creates mapping in
*accepted* state right away.
At the same time exchange protocol has an optimization to mark proposed
messages as duplicates to reduce CDM traffic in the ring.
So what happened in the test and what made it hanging is that some node loaded
mapping from disk, when another one requested its adding via exchange protocol.
Proposed message of the second node was marked as duplicated and skipped, no
accept had been sent. So the second node was waiting for accepted message
forever.
> Investigate hangs in JDBC driver testIndexState()
> -------------------------------------------------
>
> Key: IGNITE-5401
> URL: https://issues.apache.org/jira/browse/IGNITE-5401
> Project: Ignite
> Issue Type: Task
> Components: jdbc, sql
> Reporter: Vladimir Ozerov
> Assignee: Sergey Chugunov
> Fix For: 2.1
>
>
> Two JDBC tests hang from time to time. Root cause is the same as tests are
> similar.
> 1)
> org.apache.ignite.jdbc.thin.JdbcThinDynamicIndexAbstractSelfTest#testIndexState
> 2)
> org.apache.ignite.internal.jdbc2.JdbcDynamicIndexAbstractSelfTest#testIndexState
> Failures are noly happen in ATOMIC PARTITIONED cache (with and without
> "near").
> Stack trace:
> {noformat}
> [17:37:00] : [Step 4/5] Thread
> [name="test-runner-#22990%thin.JdbcThinDynamicIndexAtomicPartitionedSelfTest%",
> id=29018, state=WAITING, blockCnt=0, waitCnt=4]
> [17:37:00] : [Step 4/5] at sun.misc.Unsafe.park(Native Method)
> [17:37:00] : [Step 4/5] at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:315)
> [17:37:00] : [Step 4/5] at
> o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:176)
> [17:37:00] : [Step 4/5] at
> o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:139)
> [17:37:00] : [Step 4/5] at
> o.a.i.i.MarshallerContextImpl.registerClassName(MarshallerContextImpl.java:262)
> [17:37:00] : [Step 4/5] at
> o.a.i.i.binary.BinaryContext.registerUserClassDescriptor(BinaryContext.java:780)
> [17:37:00] : [Step 4/5] at
> o.a.i.i.binary.BinaryContext.registerClassDescriptor(BinaryContext.java:757)
> [17:37:00] : [Step 4/5] at
> o.a.i.i.binary.BinaryContext.descriptorForClass(BinaryContext.java:628)
> [17:37:00] : [Step 4/5] at
> o.a.i.i.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:164)
> [17:37:00] : [Step 4/5] at
> o.a.i.i.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:147)
> [17:37:00] : [Step 4/5] at
> o.a.i.i.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:134)
> [17:37:00] : [Step 4/5] at
> o.a.i.i.binary.GridBinaryMarshaller.marshal(GridBinaryMarshaller.java:248)
> [17:37:00] : [Step 4/5] at
> o.a.i.i.processors.cache.binary.CacheObjectBinaryProcessorImpl.marshalToBinary(CacheObjectBinaryProcessorImpl.java:371)
> [17:37:00] : [Step 4/5] at
> o.a.i.i.processors.cache.binary.CacheObjectBinaryProcessorImpl.toBinary(CacheObjectBinaryProcessorImpl.java:849)
> [17:37:00] : [Step 4/5] at
> o.a.i.i.processors.cache.binary.CacheObjectBinaryProcessorImpl.toCacheObject(CacheObjectBinaryProcessorImpl.java:799)
> [17:37:00] : [Step 4/5] at
> o.a.i.i.processors.cache.GridCacheContext.toCacheObject(GridCacheContext.java:1769)
> [17:37:00] : [Step 4/5] at
> o.a.i.i.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.mapSingleUpdate(GridNearAtomicSingleUpdateFuture.java:546)
> [17:37:00] : [Step 4/5] at
> o.a.i.i.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.map(GridNearAtomicSingleUpdateFuture.java:451)
> [17:37:00] : [Step 4/5] at
> o.a.i.i.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.mapOnTopology(GridNearAtomicSingleUpdateFuture.java:440)
> [17:37:00] : [Step 4/5] at
> o.a.i.i.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.map(GridNearAtomicAbstractUpdateFuture.java:248)
> [17:37:00] : [Step 4/5] at
> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update0(GridDhtAtomicCache.java:1161)
> [17:37:00] : [Step 4/5] at
> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.put0(GridDhtAtomicCache.java:650)
> [17:37:00] : [Step 4/5] at
> o.a.i.i.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2329)
> [17:37:00] : [Step 4/5] at
> o.a.i.i.processors.cache.distributed.near.GridNearAtomicCache.put(GridNearAtomicCache.java:444)
> [17:37:00] : [Step 4/5] at
> o.a.i.i.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2306)
> [17:37:00] : [Step 4/5] at
> o.a.i.i.processors.cache.IgniteCacheProxy.put(IgniteCacheProxy.java:1494)
> [17:37:00] : [Step 4/5] at
> o.a.i.jdbc.thin.JdbcThinDynamicIndexAbstractSelfTest.testIndexState(JdbcThinDynamicIndexAbstractSelfTest.java:273)
> [17:37:00] : [Step 4/5] at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [17:37:00] : [Step 4/5] at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> [17:37:00] : [Step 4/5] at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> [17:37:00] : [Step 4/5] at
> java.lang.reflect.Method.invoke(Method.java:606)
> [17:37:00] : [Step 4/5] at
> junit.framework.TestCase.runTest(TestCase.java:176)
> [17:37:00] : [Step 4/5] at
> o.a.i.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:1963)
> [17:37:00] : [Step 4/5] at
> o.a.i.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:130)
> [17:37:00] : [Step 4/5] at
> o.a.i.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:1878)
> [17:37:00] : [Step 4/5] at java.lang.Thread.run(Thread.java:745)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)