[
https://issues.apache.org/jira/browse/FLINK-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Flink Jira Bot updated FLINK-16030:
-----------------------------------
Labels: auto-unassigned stale-major (was: auto-unassigned)
I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help
the community manage its development. I see this issues has been marked as
Major but is unassigned and neither itself nor its Sub-Tasks have been updated
for 30 days. I have gone ahead and added a "stale-major" to the issue". If this
ticket is a Major, please either assign yourself or give an update. Afterwards,
please remove the label or in 7 days the issue will be deprioritized.
> Add heartbeat between netty server and client to detect long connection alive
> -----------------------------------------------------------------------------
>
> Key: FLINK-16030
> URL: https://issues.apache.org/jira/browse/FLINK-16030
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Network
> Affects Versions: 1.7.2, 1.8.3, 1.9.2, 1.10.0
> Reporter: begginghard
> Priority: Major
> Labels: auto-unassigned, stale-major
>
> As reported on [the user mailing
> list|https://lists.apache.org/[email protected]:lte=1M:Encountered%20error%20while%20consuming%20partitions]
> Network can fail in many ways, sometimes pretty subtle (e.g. high ratio
> packet loss).
> When the long tcp connection between netty client and server is lost, the
> server would failed to send response to the client, then shut down the
> channel. At the same time, the netty client does not know that the connection
> has been disconnected, so it has been waiting for two hours.
> To detect the long tcp connection alive on netty client and server, we should
> have two ways: tcp keepalive and heartbeat.
>
> The tcp keepalive is 2 hours by default. When the long tcp connection dead,
> you continue to wait for 2 hours, the netty client will trigger exception and
> enter failover recovery.
> If you want to detect quickly, netty provides IdleStateHandler which it use
> ping-pang mechanism. If netty client sends continuously n ping message and
> receives no one pang message, then trigger exception.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)