[ 
https://issues.apache.org/jira/browse/IGNITE-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16713611#comment-16713611
 ] 

Stanilovsky Evgeny commented on IGNITE-10469:
---------------------------------------------

Igor, i recheck your test under 2.7 ver and looks like it work corrctly, can 
you recheck it ?

[^GridTcpCommunicationSpiIdleCommunicationTimeoutTest.java]

> TcpCommunicationSpi does not break tcp connection after IdleConnectionTimeout 
> seconds of inactivity
> ---------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-10469
>                 URL: https://issues.apache.org/jira/browse/IGNITE-10469
>             Project: Ignite
>          Issue Type: Bug
>          Components: cache
>    Affects Versions: 2.5, 2.6
>            Reporter: Igor Kamyshnikov
>            Assignee: Stanilovsky Evgeny
>            Priority: Major
>         Attachments: 2.6.0.txt, 
> GridTcpCommunicationSpiIdleCommunicationTimeoutTest.java, 
> GridTcpCommunicationSpiIdleCommunicationTimeoutTest.java, ignite_idle_test.zip
>
>
> TcpCommunicationSpi does not close TCP connections after they have been idle 
> for more than configured in TcpCommunicationSpi#idleConnTimeout amount of 
> time (default is 10 minutes).
> There are environments where idle TCP connections become unusable: 
> connections remain ESTABLISHED while actual data to be sent piles up in 
> Send-Q (according to netstat). For this reason Ignite stack does not 
> recognize a communication problem for a considerable amount of time (~ 10-15 
> minutes), and it does not begin its reconnection procedure (hearbeats use 
> different tcp connections that are not idle and don't have this issue).
> I've discovered though there is a logic in the Ignite code to detect and 
> close idle connections. But due to a problem in the code it does not work 
> reliably.
> This is a test that _sometimes_ reproduces the problem.
> [^ignite_idle_test.zip] - full test project
> [^GridTcpCommunicationSpiIdleCommunicationTimeoutTest.java] - just test code
>  [^2.6.0.txt] - mvn clean install logs for test with Ignite 2.6.0
> What's the problem in the Ignite code?
> There are two loops in the Ignite code that have a chance to close idle 
> connections:
> 1) 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.CommunicationWorker#processIdle
>  - this one is executed each *IdleConnectionTimeout* milliseconds. (it can 
> close idle connections but it typically turns out that it thinks that 
> connection is not idle, thanks to the second loop).
> 2) 
> org.apache.ignite.internal.util.nio.GridNioServer.AbstractNioClientWorker#bodyInternal
>  -> 
> org.apache.ignite.internal.util.nio.GridNioServer.AbstractNioClientWorker#checkIdle
>  - this loop executes:
> {noformat}
> filterChain.onSessionIdleTimeout(ses); <-- does not actually close an idle 
> connection
> // Update timestamp to avoid multiple notifications within one timeout 
> interval.
> ses.resetSendScheduleTime(); <--- resets idle timer
> ses.bytesReceived(0);
> {noformat}
> ---
> To wind up, may be the whole approach should be reviewed:
>  - is it ok not to track message delivery time?
>  - is it ok not to do heartbeating using the same connections as for 
> get/put/... commands?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to