[ 
https://issues.apache.org/jira/browse/FLINK-28980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598422#comment-17598422
 ] 

Biao Liu commented on FLINK-28980:
----------------------------------

I've tested the scenario, and it looks good to me.

I started two TMs in different machines. And each of them has two slots. I used 
"hostname checking" with "InetAddress.getLocalHost().getHostName()" to make one 
task much slower than others (there are three subtasks of this operator). I set 
the "slow-task-detector.execution-time.baseline-ratio" to 0.5. The speculative 
task is launched as expected. I checked the web UI, metrics, logs and produced 
result. Everything works fine. There are some screenshots and log files in 
attachments.

> Release Testing: Verify FLIP-168 speculative execution
> ------------------------------------------------------
>
>                 Key: FLINK-28980
>                 URL: https://issues.apache.org/jira/browse/FLINK-28980
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>            Reporter: Zhu Zhu
>            Assignee: Biao Liu
>            Priority: Blocker
>              Labels: release-testing
>             Fix For: 1.16.0
>
>
> Speculative execution is introduced in Flink 1.16 to deal with temporary slow 
> tasks caused by slow nodes. This feature currently consists of 4 FLIPs:
>  - FLIP-168: Speculative Execution core part
>  - FLIP-224: Blocklist Mechanism
>  - FLIP-245: Source Supports Speculative Execution
>  - FLIP-249: Flink Web UI Enhancement for Speculative Execution
> This ticket aims for verifying FLIP-168, along with FLIP-224 and FLIP-249.
> More details about this feature and how to use it can be found in this 
> [documentation|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/speculative_execution/].
> To do the verification, the process can be:
>  - Write a Flink job which has a subtask running much slower than others 
> (e.g. sleep indefinitely if it runs on a certain host, the hostname can be 
> retrieved via InetAddress.getLocalHost().getHostName(), or if its 
> (subtaskIndex + attemptNumer) % 2 == 0)
>  - Modify Flink configuration file to enable speculative execution and tune 
> the configuration as you like
>  - Submit the job. Checking the web UI, logs, metrics and produced result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to