[ 
https://issues.apache.org/jira/browse/IGNITE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368442#comment-15368442
 ] 

Denis Magda commented on IGNITE-3412:
-------------------------------------

[~ascherbakov], seems that I figured out why the client hangs and the 
interrupted flag is ignored. The reason is that we ignore 
{{InterruptedIOException}} in the loop of {{SocketWriter}}. Basically the 
following can happen:

1) {{SocketWriter}} is writing a chunk of data using {{spi.writeToSocket(sock, 
msg, socketTimeout)}};
2) {{Ignite}} is being closed and {{interrupt}} is executed for 
{{SocketWriter}} thread;
3) underlying I/O code that is executed by {{spi.writeToSocket(sock, msg, 
socketTimeout)}} sees that the thread has been interrupted and throws 
{{InterruptedIOExceptio}} and not {{InterruptedException}}!!! When 
{{InterruptedIOException}} is thrown the interrupted flag is cleared; 
4) since {{InterruptedIOException}} is just a kind of {{IOException}} then 
{{SocketWriter}} catches this exception, closes the socket, checks that the 
Thread is not interrupted and hangs at {{mux.wait()}}.

To fix the issue we must threat {{InterruptedIOException}} differently. Not 
only {{SocketWriter}} but other similar parts of the platform must be fixed. 
I'm assigning the ticket on myself and will finalize it in the nearest time.

BTW, do you have the logs at hands? I remember you were telling that before 
{{SocketWriter}} moved to the hanging state it threw some kind of IO Exception. 
What was that?

> Client instance hangs on close
> ------------------------------
>
>                 Key: IGNITE-3412
>                 URL: https://issues.apache.org/jira/browse/IGNITE-3412
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 1.6
>            Reporter: Alexei Scherbakov
>            Assignee: Alexei Scherbakov
>             Fix For: 1.7
>
>         Attachments: threadDump.txt
>
>
> In some cases calling close on Ignite client instance will lead to deadlock.
> The deadlock happens because of the following.
> Socket writer is waiting for new messages.
> {code}
> "tcp-client-disco-sock-writer-#2%null%" #100 prio=6 os_prio=0 
> tid=0x000000005fad2800 nid=0x13bc in Object.wait() [0x0000000067d0e000]
>    java.lang.Thread.State: WAITING (on object monitor)
>       at java.lang.Object.wait(Native Method)
>       at java.lang.Object.wait(Object.java:502)
>       at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$SocketWriter.body(ClientImpl.java:1051)
>       - locked <0x00000000863da2f8> (a java.lang.Object)
>       at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> {code}
> The closing process is hanging because TcpDiscoverySPI waits while this 
> writer is terminated
> {code}
> "Thread-6" #29 prio=6 os_prio=0 tid=0x000000005a740000 nid=0x17e8 in 
> Object.wait() [0x000000006077e000]
>    java.lang.Thread.State: WAITING (on object monitor)
>       at java.lang.Object.wait(Native Method)
>       at java.lang.Thread.join(Thread.java:1245)
>       - locked <0x00000000863da010> (a 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$SocketWriter)
>       at java.lang.Thread.join(Thread.java:1319)
>       at 
> org.apache.ignite.internal.util.IgniteUtils.join(IgniteUtils.java:4476)
>       at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl.spiStop(ClientImpl.java:295)
>       at 
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStop(TcpDiscoverySpi.java:1905)
>       at 
> org.apache.ignite.internal.managers.GridManagerAdapter.stopSpi(GridManagerAdapter.java:325)
>       at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.stop(GridDiscoveryManager.java:1336)
>       at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:1940)
>       at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:1812)
>       at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2248)
>       - locked <0x00000000858e77a8> (a 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance)
>       at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2211)
>       at org.apache.ignite.internal.IgnitionEx.stop(IgnitionEx.java:322)
>       at org.apache.ignite.Ignition.stop(Ignition.java:224)
>       at org.apache.ignite.internal.IgniteKernal.close(IgniteKernal.java:2921)
>       at ru.sbrf.ggcod.loader.job.MainLoader.run(MainLoader.java:123)
>       at java.lang.Thread.run(Thread.java:745)
> {code}
> There is some raise that led to the situation when the writer is hanging on 
> {{Object.wait}} method ignoring interrupted flag that was set at some point 
> of time.
> The full thread dump is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to