[ 
https://issues.apache.org/jira/browse/FLINK-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Yao updated FLINK-14822:
-----------------------------
    Description: The tests fails because we exhaust the number of restarts (3). 
The reason is that the new scheduler may re-schedule tasks faster – we start 
counting down the restart back-off time as soon as we triggered task 
cancellation, however the legacy scheduler will only start counting down after 
the task cancellation is finished. Thus, re-scheduled tasks may be deployed 
into a TM that was killed, and therefore increase the number of restarts 
multiple times. The speed of the TM loss detection depends on 
heartbeat.interval and heartbeat.timeout. These settings are by default 10s and 
50s respectively.   (was: The tests fails because we exhaust the number of 
restarts (3). )

> Enable 'Streaming File Sink end-to-end test' to pass with new DefaultScheduler
> ------------------------------------------------------------------------------
>
>                 Key: FLINK-14822
>                 URL: https://issues.apache.org/jira/browse/FLINK-14822
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Tests
>    Affects Versions: 1.10.0
>            Reporter: Gary Yao
>            Assignee: Gary Yao
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.10.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The tests fails because we exhaust the number of restarts (3). The reason is 
> that the new scheduler may re-schedule tasks faster – we start counting down 
> the restart back-off time as soon as we triggered task cancellation, however 
> the legacy scheduler will only start counting down after the task 
> cancellation is finished. Thus, re-scheduled tasks may be deployed into a TM 
> that was killed, and therefore increase the number of restarts multiple 
> times. The speed of the TM loss detection depends on heartbeat.interval and 
> heartbeat.timeout. These settings are by default 10s and 50s respectively. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to