[jira] [Updated] (FLINK-26400) Release Testing: Explicit shutdown signalling from TaskManager to JobManager

Niklas Semmler (Jira) Mon, 28 Feb 2022 08:30:13 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-26400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Niklas Semmler updated FLINK-26400:
-----------------------------------
    Description: 
FLINK-25277 introduces explicit signalling between a TaskManager and the 
JobManager when the TaskManager shuts down. This reduces the time it takes for 
a reactive cluster to down-scale & restart.

 

*Setup*
 # Add the following line to your flink config to enable reactive mode:
{code:java}
taskmanager.host: localhost # a workaround
scheduler-mode: reactive
restart-strategy: fixeddelay
restart-strategy.fixed-delay.attempts: 100 
{code}

 # Create a “usrlib” folder and place the TopSpeedWindowing jar into it
{code:java}
$ mkdir usrlib
$ cp 
examples/flink-examples-streaming/target/flink-examples-streaming_2.12-1.15-SNAPSHOT-TopSpeedWindowing.jar
 usrlib/{code}

 # Start the job 
{code:java}
$ bin/standalone-job.sh start  --main-class 
org.apache.flink.streaming.examples.windowing.TopSpeedWindowing {code}

 # Start three task managers
{code:java}
$ bin/taskmanager.sh start
$ bin/taskmanager.sh start
$ bin/taskmanager.sh start{code}

 # Wait for the job to stabilize. The log file should show that three tasks 
start for every operator.
{code:java}
 GlobalWindows -> Sink: Print to Std. Out (3/3) 
(d10339d5755d07f3d9864ed1b2147af2) switched from INITIALIZING to RUNNING.{code}

*Test*

Stop one taskmanager

{{bin/taskmanager.sh stop}}

Success condition: You should see that the job cancels and re-runs after a few 
seconds. In the logs you should find a line with the text “The TaskExecutor is 
shutting down”.

 

*Teardown*

Stop all taskmanagers and the jobmanager:

{{bin/standalone-job.sh stop}}
{{bin/taskmanager.sh stop-all}}

  was:
FLINK-25277 introduces explicit signalling between a TaskManager and the 
JobManager when the TaskManager shuts down. This reduces the time it takes for 
a reactive cluster to down-scale & restart.

 

*Setup*
 # Add the following line to your flink config to enable reactive mode:
{code:java}
taskmanager.host: localhost # a workaround
scheduler-mode: reactive
restart-strategy: fixeddelay
restart-strategy.fixed-delay.attempts: 100 
{code}
 # Create a “usrlib” folder and place the TopSpeedWindowing jar into it
{code:java}
$ mkdir usrlib
$ cp 
../../../../flink-examples/flink-examples-streaming/target/flink-examples-streaming_2.12-1.15-SNAPSHOT-TopSpeedWindowing.jar
 usrlib/{code}
 # Start the job 
{code:java}
$ bin/standalone-job.sh start  --main-class 
org.apache.flink.streaming.examples.windowing.TopSpeedWindowing {code}
 # Start three task managers
{code:java}
$ bin/taskmanager.sh start
$ bin/taskmanager.sh start
$ bin/taskmanager.sh start{code}
 # Wait for the job to stabilize. The log file should show that three tasks 
start for every operator.
{code:java}
 GlobalWindows -> Sink: Print to Std. Out (3/3) 
(d10339d5755d07f3d9864ed1b2147af2) switched from INITIALIZING to RUNNING.{code}

*Test*

Stop one taskmanager

{{bin/taskmanager.sh stop}}

Success condition: You should see that the job cancels and re-runs after a few 
seconds. In the logs you should find a line with the text “The TaskExecutor is 
shutting down”.

 

*Teardown*

Stop all taskmanagers and the jobmanager:

{{bin/standalone-job.sh stop}}
{{bin/taskmanager.sh stop-all}}


> Release Testing: Explicit shutdown signalling from TaskManager to JobManager
> ----------------------------------------------------------------------------
>
>                 Key: FLINK-26400
>                 URL: https://issues.apache.org/jira/browse/FLINK-26400
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>    Affects Versions: 1.15.0
>            Reporter: Niklas Semmler
>            Priority: Blocker
>              Labels: release-testing
>             Fix For: 1.15.0
>
>
> FLINK-25277 introduces explicit signalling between a TaskManager and the 
> JobManager when the TaskManager shuts down. This reduces the time it takes 
> for a reactive cluster to down-scale & restart.
>  
> *Setup*
>  # Add the following line to your flink config to enable reactive mode:
> {code:java}
> taskmanager.host: localhost # a workaround
> scheduler-mode: reactive
> restart-strategy: fixeddelay
> restart-strategy.fixed-delay.attempts: 100 
> {code}
>  # Create a “usrlib” folder and place the TopSpeedWindowing jar into it
> {code:java}
> $ mkdir usrlib
> $ cp 
> examples/flink-examples-streaming/target/flink-examples-streaming_2.12-1.15-SNAPSHOT-TopSpeedWindowing.jar
>  usrlib/{code}
>  # Start the job 
> {code:java}
> $ bin/standalone-job.sh start  --main-class 
> org.apache.flink.streaming.examples.windowing.TopSpeedWindowing {code}
>  # Start three task managers
> {code:java}
> $ bin/taskmanager.sh start
> $ bin/taskmanager.sh start
> $ bin/taskmanager.sh start{code}
>  # Wait for the job to stabilize. The log file should show that three tasks 
> start for every operator.
> {code:java}
>  GlobalWindows -> Sink: Print to Std. Out (3/3) 
> (d10339d5755d07f3d9864ed1b2147af2) switched from INITIALIZING to 
> RUNNING.{code}
> *Test*
> Stop one taskmanager
> {{bin/taskmanager.sh stop}}
> Success condition: You should see that the job cancels and re-runs after a 
> few seconds. In the logs you should find a line with the text “The 
> TaskExecutor is shutting down”.
>  
> *Teardown*
> Stop all taskmanagers and the jobmanager:
> {{bin/standalone-job.sh stop}}
> {{bin/taskmanager.sh stop-all}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-26400) Release Testing: Explicit shutdown signalling from TaskManager to JobManager

Reply via email to