[
https://issues.apache.org/jira/browse/FLINK-23202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Wysakowicz updated FLINK-23202:
-------------------------------------
Release Note:
The same way Flink detects unreachable heartbeat targets faster, Flink now also
immediately fails RPCs where the target is known by the OS to be unreachable on
a network level, instead of waiting for a timeout (akka.ask.timeout).
One effect this are faster task failovers, because cancelling tasks on a dead
TaskExecutor no longer gets delayed by the RPC timeout.
If this faster failover is a problem in certain setups (which might rely on the
fast that external systems hit timeouts), we recommend to configure the
application's restart strategy with a restart delay.
was:Flink now fails rpc requests that cannot be delivered immediately instead
of waiting for the `akka.ask.timeout`. This has the effect that certain
operations, such as cancelling tasks on a dead `TaskExecutor`, will no longer
delay the restart of jobs by `akka.ask.timeout`. This might increase the number
of restart attempts Flink will do in a given time interval. Hence, we recommend
to adjust the restart delay to compensate for the faster completion of rpcs if
this behaviour should become a problem.
> RpcService should fail result futures if messages could not be sent
> -------------------------------------------------------------------
>
> Key: FLINK-23202
> URL: https://issues.apache.org/jira/browse/FLINK-23202
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Coordination
> Affects Versions: 1.13.1, 1.12.4
> Reporter: Till Rohrmann
> Assignee: Till Rohrmann
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.14.0
>
>
> The {{RpcService}} should fail result futures if messages could not be sent.
> This would speed up the failure detection mechanism because it would not rely
> on the timeout. One way to achieve this could be to listen to the dead
> letters and then sending a {{Failure}} message back to the sender.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)