[ 
https://issues.apache.org/jira/browse/IGNITE-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070359#comment-16070359
 ] 

Sergey Chugunov edited comment on IGNITE-5401 at 7/3/17 9:46 AM:
-----------------------------------------------------------------

This hang was caused by a very specific scenario that may happen in multinode 
cluster setup.

Source of this scenario is that marshaller mappings can be added to local map 
on each node in two different ways: they may be read from file system (from 
*%IGNITE_HOME%/marshaller* directory) or may be created during mapping exchange 
process (which involves exchanging proposed/accepted Custom Discovery Messages).

Loading from file system is *local* operation; when some node reads mapping it 
doesn't notify other nodes about this fact. And it creates mapping in 
*accepted* state right away.
At the same time exchange protocol has an optimization to mark proposed 
messages as duplicates to reduce CDM traffic in the ring.

So what happened in the test and what made it hanging is that some node loaded 
mapping from disk, when another one requested adding the same mapping via 
exchange protocol. Proposed message of the second node was marked as duplicated 
and skipped, no accept had been sent. So the second node was waiting for 
accepted message forever.


was (Author: sergey-chugunov):
This hang was caused by a very specific scenario that may happen in multinode 
cluster setup.

Source of this scenario is that marshaller mappings can be added to local map 
on each node in two different ways: they may be read from file system (from 
*%IGNITE_HOME%/marshaller* directory) or may be created during mapping exchange 
process (which involves exchanging proposed/accepted Custom Discovery Messages).

Loading from file system is *local* operation; when some node reads mapping it 
doesn't notify other nodes about this fact. And it creates mapping in 
*accepted* state right away.
At the same time exchange protocol has an optimization to mark proposed 
messages as duplicates to reduce CDM traffic in the ring.

So what happened in the test and what made it hanging is that some node loaded 
mapping from disk, when another one requested its adding via exchange protocol. 
Proposed message of the second node was marked as duplicated and skipped, no 
accept had been sent. So the second node was waiting for accepted message 
forever.

> Investigate hangs in JDBC driver testIndexState()
> -------------------------------------------------
>
>                 Key: IGNITE-5401
>                 URL: https://issues.apache.org/jira/browse/IGNITE-5401
>             Project: Ignite
>          Issue Type: Task
>          Components: jdbc, sql
>            Reporter: Vladimir Ozerov
>            Assignee: Sergey Chugunov
>             Fix For: 2.1
>
>
> Two JDBC tests hang from time to time. Root cause is the same as tests are 
> similar.
> 1) 
> org.apache.ignite.jdbc.thin.JdbcThinDynamicIndexAbstractSelfTest#testIndexState
> 2) 
> org.apache.ignite.internal.jdbc2.JdbcDynamicIndexAbstractSelfTest#testIndexState
> Failures are noly happen in ATOMIC PARTITIONED cache (with and without 
> "near").
> Stack trace:
> {noformat}
> [17:37:00] :   [Step 4/5] Thread 
> [name="test-runner-#22990%thin.JdbcThinDynamicIndexAtomicPartitionedSelfTest%",
>  id=29018, state=WAITING, blockCnt=0, waitCnt=4]
> [17:37:00] :   [Step 4/5]         at sun.misc.Unsafe.park(Native Method)
> [17:37:00] :   [Step 4/5]         at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:315)
> [17:37:00] :   [Step 4/5]         at 
> o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:176)
> [17:37:00] :   [Step 4/5]         at 
> o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:139)
> [17:37:00] :   [Step 4/5]         at 
> o.a.i.i.MarshallerContextImpl.registerClassName(MarshallerContextImpl.java:262)
> [17:37:00] :   [Step 4/5]         at 
> o.a.i.i.binary.BinaryContext.registerUserClassDescriptor(BinaryContext.java:780)
> [17:37:00] :   [Step 4/5]         at 
> o.a.i.i.binary.BinaryContext.registerClassDescriptor(BinaryContext.java:757)
> [17:37:00] :   [Step 4/5]         at 
> o.a.i.i.binary.BinaryContext.descriptorForClass(BinaryContext.java:628)
> [17:37:00] :   [Step 4/5]         at 
> o.a.i.i.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:164)
> [17:37:00] :   [Step 4/5]         at 
> o.a.i.i.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:147)
> [17:37:00] :   [Step 4/5]         at 
> o.a.i.i.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:134)
> [17:37:00] :   [Step 4/5]         at 
> o.a.i.i.binary.GridBinaryMarshaller.marshal(GridBinaryMarshaller.java:248)
> [17:37:00] :   [Step 4/5]         at 
> o.a.i.i.processors.cache.binary.CacheObjectBinaryProcessorImpl.marshalToBinary(CacheObjectBinaryProcessorImpl.java:371)
> [17:37:00] :   [Step 4/5]         at 
> o.a.i.i.processors.cache.binary.CacheObjectBinaryProcessorImpl.toBinary(CacheObjectBinaryProcessorImpl.java:849)
> [17:37:00] :   [Step 4/5]         at 
> o.a.i.i.processors.cache.binary.CacheObjectBinaryProcessorImpl.toCacheObject(CacheObjectBinaryProcessorImpl.java:799)
> [17:37:00] :   [Step 4/5]         at 
> o.a.i.i.processors.cache.GridCacheContext.toCacheObject(GridCacheContext.java:1769)
> [17:37:00] :   [Step 4/5]         at 
> o.a.i.i.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.mapSingleUpdate(GridNearAtomicSingleUpdateFuture.java:546)
> [17:37:00] :   [Step 4/5]         at 
> o.a.i.i.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.map(GridNearAtomicSingleUpdateFuture.java:451)
> [17:37:00] :   [Step 4/5]         at 
> o.a.i.i.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.mapOnTopology(GridNearAtomicSingleUpdateFuture.java:440)
> [17:37:00] :   [Step 4/5]         at 
> o.a.i.i.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.map(GridNearAtomicAbstractUpdateFuture.java:248)
> [17:37:00] :   [Step 4/5]         at 
> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update0(GridDhtAtomicCache.java:1161)
> [17:37:00] :   [Step 4/5]         at 
> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.put0(GridDhtAtomicCache.java:650)
> [17:37:00] :   [Step 4/5]         at 
> o.a.i.i.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2329)
> [17:37:00] :   [Step 4/5]         at 
> o.a.i.i.processors.cache.distributed.near.GridNearAtomicCache.put(GridNearAtomicCache.java:444)
> [17:37:00] :   [Step 4/5]         at 
> o.a.i.i.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2306)
> [17:37:00] :   [Step 4/5]         at 
> o.a.i.i.processors.cache.IgniteCacheProxy.put(IgniteCacheProxy.java:1494)
> [17:37:00] :   [Step 4/5]         at 
> o.a.i.jdbc.thin.JdbcThinDynamicIndexAbstractSelfTest.testIndexState(JdbcThinDynamicIndexAbstractSelfTest.java:273)
> [17:37:00] :   [Step 4/5]         at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [17:37:00] :   [Step 4/5]         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> [17:37:00] :   [Step 4/5]         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> [17:37:00] :   [Step 4/5]         at 
> java.lang.reflect.Method.invoke(Method.java:606)
> [17:37:00] :   [Step 4/5]         at 
> junit.framework.TestCase.runTest(TestCase.java:176)
> [17:37:00] :   [Step 4/5]         at 
> o.a.i.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:1963)
> [17:37:00] :   [Step 4/5]         at 
> o.a.i.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:130)
> [17:37:00] :   [Step 4/5]         at 
> o.a.i.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:1878)
> [17:37:00] :   [Step 4/5]         at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to