[ 
https://issues.apache.org/jira/browse/FLINK-23202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz updated FLINK-23202:
-------------------------------------
    Release Note: 
The same way Flink detects unreachable heartbeat targets faster, Flink now also 
immediately fails RPCs where the target is known by the OS to be unreachable on 
a network level, instead of waiting for a timeout (akka.ask.timeout).

One effect this are faster task failovers, because cancelling tasks on a dead 
TaskExecutor no longer gets delayed by the RPC timeout.

If this faster failover is a problem in certain setups (which might rely on the 
fast that external systems hit timeouts), we recommend to configure the 
application's restart strategy with a restart delay.

  was:Flink now fails rpc requests that cannot be delivered immediately instead 
of waiting for the `akka.ask.timeout`. This has the effect that certain 
operations, such as cancelling tasks on a dead `TaskExecutor`, will no longer 
delay the restart of jobs by `akka.ask.timeout`. This might increase the number 
of restart attempts Flink will do in a given time interval. Hence, we recommend 
to adjust the restart delay to compensate for the faster completion of rpcs if 
this behaviour should become a problem.


> RpcService should fail result futures if messages could not be sent
> -------------------------------------------------------------------
>
>                 Key: FLINK-23202
>                 URL: https://issues.apache.org/jira/browse/FLINK-23202
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>    Affects Versions: 1.13.1, 1.12.4
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.14.0
>
>
> The {{RpcService}} should fail result futures if messages could not be sent. 
> This would speed up the failure detection mechanism because it would not rely 
> on the timeout. One way to achieve this could be to listen to the dead 
> letters and then sending a {{Failure}} message back to the sender.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to