[jira] [Updated] (FLINK-29769) Further limit the explosion range of failover in hybrid shuffle mode

Weijie Guo (Jira) Tue, 29 Nov 2022 04:48:03 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-29769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Weijie Guo updated FLINK-29769:
-------------------------------
    Description: Under the current failover strategy, if a region changes to 
the failed state, all its downstream regions must be restarted. For ALL_ 
EDGE_BLOCKING type jobs, since they are scheduled stage by stage, no additional 
overhead. However, for the hybrid shuffle mode, the upstream and downstream can 
both run at the same time. If the upstream task fails, we hope that it will not 
affect the downstream tasks that do not consume it.  (was: Under the current 
failover strategy, if a region changes to the failed state, all its downstream 
regions must be restarted. For ALL_ EDGE_BLOCKING type jobs, since they are 
scheduled stage by state, no additional overhead. However, for the hybrid 
shuffle mode, the upstream and downstream can both run at the same time. If the 
upstream task fails, we hope that it will not affect the downstream regions 
that do not consume it.)

> Further limit the explosion range of failover in hybrid shuffle mode
> --------------------------------------------------------------------
>
>                 Key: FLINK-29769
>                 URL: https://issues.apache.org/jira/browse/FLINK-29769
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>    Affects Versions: 1.17.0
>            Reporter: Weijie Guo
>            Priority: Major
>
> Under the current failover strategy, if a region changes to the failed state, 
> all its downstream regions must be restarted. For ALL_ EDGE_BLOCKING type 
> jobs, since they are scheduled stage by stage, no additional overhead. 
> However, for the hybrid shuffle mode, the upstream and downstream can both 
> run at the same time. If the upstream task fails, we hope that it will not 
> affect the downstream tasks that do not consume it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-29769) Further limit the explosion range of failover in hybrid shuffle mode

Reply via email to