[
https://issues.apache.org/jira/browse/FLINK-10240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhu Zhu updated FLINK-10240:
----------------------------
Summary: Pluggable scheduling strategy for batch jobs (was: Pluggable
scheduling strategy for batch job)
> Pluggable scheduling strategy for batch jobs
> --------------------------------------------
>
> Key: FLINK-10240
> URL: https://issues.apache.org/jira/browse/FLINK-10240
> Project: Flink
> Issue Type: New Feature
> Components: Distributed Coordination
> Reporter: Zhu Zhu
> Priority: Major
> Labels: scheduling
>
> Currently batch jobs are scheduled with LAZY_FROM_SOURCES strategy: source
> tasks are scheduled in the beginning, and other tasks are scheduled once
> there input data are consumable.
> However, input data consumable does not always mean the task can work at
> once.
>
> One example is the hash join operation, where the operator first consumes one
> side(we call it build side) to setup a table, then consumes the other side(we
> call it probe side) to do the real join work. If the probe side is started
> early, it just get stuck on back pressure as the join operator will not
> consume data from it before the building stage is done, causing a waste of
> resources.
> If we have the probe side task started after the build stage is done, both
> the build and probe side can have more computing resources as they are
> staggered.
>
> That's why we think a flexible scheduling strategy is needed, allowing job
> owners to customize the vertex schedule order and constraints. Better
> resource utilization usually means better performance.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)