[
https://issues.apache.org/jira/browse/IGNITE-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504922#comment-16504922
]
Alexey Goncharuk commented on IGNITE-8657:
------------------------------------------
[~sergey-chugunov], I think I've found an issue in the tests:
Take a look at the latest run of Binary Objects (Simple Mapper Basic)
https://ci.ignite.apache.org/viewLog.html?buildId=1367214&buildTypeId=IgniteTests24Java8_BinaryObjectsSimpleMapperBasic&tab=buildResultsDiv
I see the following assertion in the log
{code}
[16:30:59]W: [org.apache.ignite:ignite-core]
java.lang.AssertionError: TcpDiscoveryNode
[id=d089379e-11db-453f-99a0-a270bc200002, addrs=[127.0.0.1],
sockAddrs=[/127.0.0.1:47502], discPort=47502, order=341, intOrder=172,
lastExchangeTime=1528378258963, loc=false, ver=2.6.0#20180607-sha1:8f8efe4f,
isClient=false]
[16:30:59]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.internal.IgniteNeedReconnectException.<init>(IgniteNeedReconnectException.java:38)
[16:30:59]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.forceClientReconnect(GridDhtPartitionsExchangeFuture.java:2051)
[16:30:59]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processSinglePartitionUpdate(GridCachePartitionExchangeManager.java:1569)
[16:30:59]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1000(GridCachePartitionExchangeManager.java:138)
[16:30:59]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:345)
[16:30:59]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:325)
[16:30:59]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2837)
[16:30:59]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2816)
[16:30:59]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1054)
[16:30:59]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579)
[16:30:59]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
[16:30:59]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
[16:30:59]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
[16:30:59]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
[16:30:59]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
[16:30:59]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
[16:30:59]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125)
[16:30:59]W: [org.apache.ignite:ignite-core] at
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091)
[16:30:59]W: [org.apache.ignite:ignite-core] at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[16:30:59]W: [org.apache.ignite:ignite-core] at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[16:30:59]W: [org.apache.ignite:ignite-core] at
java.lang.Thread.run(Thread.java:745)
{code}
Looks like the exception may be deserialized on a non-client node, so the
assertion should be removed and properly handled on receive.
> Simultaneous start of bunch of client nodes may lead to some clients hangs
> --------------------------------------------------------------------------
>
> Key: IGNITE-8657
> URL: https://issues.apache.org/jira/browse/IGNITE-8657
> Project: Ignite
> Issue Type: Bug
> Affects Versions: 2.5
> Reporter: Sergey Chugunov
> Assignee: Sergey Chugunov
> Priority: Major
> Fix For: 2.6
>
>
> h3. Description
> PartitionExchangeManager uses a system property
> *IGNITE_EXCHANGE_HISTORY_SIZE* to manage max number of exchange objects and
> optimize memory consumption.
> Default value of the property is 1000 but in scenarios with many caches and
> partitions it is reasonable to set exchange history size to a smaller values
> around few dozens.
> Then if user starts up at once more client nodes than history size some
> clients may hang because their exchange information was preempted and no
> longer available.
> h3. Workarounds
> Two workarounds are possible:
> * Do not start at once more clients than history size.
> * Restart hanging client node.
> h3. Solution
> Forcing client node to reconnect when server detected loosing its exchange
> information prevents client nodes hanging.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)