[
https://issues.apache.org/jira/browse/IGNITE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16428181#comment-16428181
]
Vladimir Ozerov commented on IGNITE-7944:
-----------------------------------------
[~guseinov], [~dpavlov], patch looks simple to me, but I have some doubts
anyway:
1) Now we do not cancel child tasks in case of disconnect. Is it? Could it lead
to some uncompleted futures or so?
2) When client is not connected, we do not throw an exception, but just exit
{{send}} method. Ack closure is not notified either. Can we throw an exception
instead? What would be the consequences?
> Disconnected client node tries to send JOB_CANCEL message
> ---------------------------------------------------------
>
> Key: IGNITE-7944
> URL: https://issues.apache.org/jira/browse/IGNITE-7944
> Project: Ignite
> Issue Type: Bug
> Components: messaging
> Affects Versions: 1.9, 2.3
> Reporter: Roman Guseinov
> Assignee: Roman Guseinov
> Priority: Major
> Fix For: 2.5
>
> Attachments: Reproducer7944.java
>
>
> In case the network is blocked (socket connections not closed) and failure is
> detected, tcp-client-disco-msg-worker thread can be stuck in process of
> TcpClient creating:
> {code:java}
> "tcp-client-disco-msg-worker-#4%wd5prsvtots0016a-tg-QueryFabric%" #494 prio=5
> os_prio=0 tid=0x00007f94c067c800 nid=0x2bdf runnable [0x00007f960ecf1000]
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.Net.poll(Native Method)
> at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:954)
> - locked <0x00007fa140f520c0> (a java.lang.Object)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:110)
> - locked <0x00007fa140f520b0> (a java.lang.Object)
> at
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2950)
> at
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2681)
> at
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2568)
> at
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2429)
> at
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2393)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1590)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1659)
> at
> org.apache.ignite.internal.processors.task.GridTaskWorker.cancelChildren(GridTaskWorker.java:1305)
> at
> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1609)
> at
> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1581)
> at
> org.apache.ignite.internal.processors.task.GridTaskProcessor.onDisconnected(GridTaskProcessor.java:168)
> at
> org.apache.ignite.internal.IgniteKernal.onDisconnected(IgniteKernal.java:3460)
> at
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery(GridDiscoveryManager.java:601)
> at
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2407)
> at
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2386)
> at
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1714)
> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> {code}
> It looks like msg-worker is trying to send JOB_CANCEL message for each job
> with timeout equals failureDetectionTimeout.
> Reproducer is attached.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)