[
https://issues.apache.org/jira/browse/FLINK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17362264#comment-17362264
]
wangwj commented on FLINK-10644:
--------------------------------
[~trohrmann]
Hi Till.
I have finished FLIP.
https://cwiki.apache.org/confluence/display/FLINK/FLIP-168%3A+Speculative+execution+for+Batch+Job
Could you please help me review it first? Or, I directly sent an e-mail to
[email protected] for discussion?
Looking forward to your reply.
Thanks~
> Batch Job: Speculative execution
> --------------------------------
>
> Key: FLINK-10644
> URL: https://issues.apache.org/jira/browse/FLINK-10644
> Project: Flink
> Issue Type: New Feature
> Components: Runtime / Coordination
> Reporter: JIN SUN
> Assignee: BoWang
> Priority: Major
> Labels: stale-assigned
>
> Strugglers/outlier are tasks that run slower than most of the all tasks in a
> Batch Job, this somehow impact job latency, as pretty much this straggler
> will be in the critical path of the job and become as the bottleneck.
> Tasks may be slow for various reasons, including hardware degradation, or
> software mis-configuration, or noise neighboring. It's hard for JM to predict
> the runtime.
> To reduce the overhead of strugglers, other system such as Hadoop/Tez, Spark
> has *_speculative execution_*. Speculative execution is a health-check
> procedure that checks for tasks to be speculated, i.e. running slower in a
> ExecutionJobVertex than the median of all successfully completed tasks in
> that EJV, Such slow tasks will be re-submitted to another TM. It will not
> stop the slow tasks, but run a new copy in parallel. And will kill the others
> if one of them complete.
> This JIRA is an umbrella to apply this kind of idea in FLINK. Details will be
> append later.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)