[
https://issues.apache.org/jira/browse/IGNITE-16462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pavel Tupitsyn updated IGNITE-16462:
------------------------------------
Description:
*Why*
TCP connections can enter [half-open
state|https://en.wikipedia.org/wiki/TCP_half-open]: seems to be alive, but any
attempt to send data will fail. Long-living and mostly idle connections are
especially susceptible to this behavior.
Retry mechanism ([IEP-82 Thin Client Retry
Policy|https://cwiki.apache.org/confluence/display/IGNITE/IEP-82+Thin+Client+Retry+Policy])
in thin client implementations partially mitigates the issue. However, not all
operations are safe to retry, and reconnect affects performance.
To improve the connection stability and detect failures early we can add a
keep-alive mechanism.
*Why not TCP keepalive*
TCP has a [built-in keepalive
mechanism|https://en.wikipedia.org/wiki/Keepalive], but it has some
disadvantages:
* Optional (may be not present in some TCP stacks)
* May be not handled well by some routers (RFC 1122, section 4.2.3.6)
* Default timeout is too long (2 hours), and is problematic to adjust on SDK
versions that are in use in Ignite (Java 8, .NET Standard 2.0), or hard to do
right in some languages (Python, JS).
Because of that, some protocols implement keepalive logic on a higher level
(SMB, TCP). More details:
https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
was:
*Why*
TCP connections can enter [half-open
state|https://en.wikipedia.org/wiki/TCP_half-open]: seems to be alive, but any
attempt to send data will fail. Long-living and mostly idle connections are
especially susceptible to this behavior.
Retry mechanism ([IEP-82 Thin Client Retry
Policy|https://cwiki.apache.org/confluence/display/IGNITE/IEP-82+Thin+Client+Retry+Policy])
in thin client implementations partially mitigates the issue. However, not all
operations are safe to retry, and reconnect affects performance.
To improve the connection stability and detect failures early we can add a
keep-alive mechanism.
*Why not TCP keepalive*
TCP has a [built-in keepalive
mechanism|https://en.wikipedia.org/wiki/Keepalive], but it has some
disadvantages:
* Optional (may be not present in some TCP stacks)
* May be not handled well by some routers (RFC 1122, section 4.2.3.6)
* Default timeout is too long (2 hours), and is problematic to adjust on SDK
versions that are in use in Ignite (Java 8, .NET Standard 2.0), or hard to do
right in some languages (Python, JS).
> Thin client: add keep-alive message to detect half-open connections
> -------------------------------------------------------------------
>
> Key: IGNITE-16462
> URL: https://issues.apache.org/jira/browse/IGNITE-16462
> Project: Ignite
> Issue Type: Improvement
> Components: platforms, thin client
> Reporter: Pavel Tupitsyn
> Assignee: Pavel Tupitsyn
> Priority: Major
> Fix For: 2.13
>
>
> *Why*
> TCP connections can enter [half-open
> state|https://en.wikipedia.org/wiki/TCP_half-open]: seems to be alive, but
> any attempt to send data will fail. Long-living and mostly idle connections
> are especially susceptible to this behavior.
> Retry mechanism ([IEP-82 Thin Client Retry
> Policy|https://cwiki.apache.org/confluence/display/IGNITE/IEP-82+Thin+Client+Retry+Policy])
> in thin client implementations partially mitigates the issue. However, not
> all operations are safe to retry, and reconnect affects performance.
> To improve the connection stability and detect failures early we can add a
> keep-alive mechanism.
> *Why not TCP keepalive*
> TCP has a [built-in keepalive
> mechanism|https://en.wikipedia.org/wiki/Keepalive], but it has some
> disadvantages:
> * Optional (may be not present in some TCP stacks)
> * May be not handled well by some routers (RFC 1122, section 4.2.3.6)
> * Default timeout is too long (2 hours), and is problematic to adjust on SDK
> versions that are in use in Ignite (Java 8, .NET Standard 2.0), or hard to do
> right in some languages (Python, JS).
> Because of that, some protocols implement keepalive logic on a higher level
> (SMB, TCP). More details:
> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
--
This message was sent by Atlassian Jira
(v8.20.1#820001)