[
https://issues.apache.org/jira/browse/FLINK-13163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16882597#comment-16882597
]
Zhu Zhu commented on FLINK-13163:
---------------------------------
Hi [~kkrugler], the input splits problem discussed above may result in more
regression in one task failover, but should not increase the chance that a task
failover happens.
To identify why increasing source parallelism can lead to more failures, I
think we need to check the failure cause exception to see why it is happening.
We can do it in another mail thread, since this JIRA may be not related.
> Support execution of batch jobs with fewer slots than requested
> ---------------------------------------------------------------
>
> Key: FLINK-13163
> URL: https://issues.apache.org/jira/browse/FLINK-13163
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Coordination
> Affects Versions: 1.9.0
> Reporter: Jeff Zhang
> Assignee: Till Rohrmann
> Priority: Major
> Fix For: 1.9.0
>
>
> Flink should be able to execute batch jobs with fewer slots than requested in
> a sequential manner.
> At the moment, however, we register for every slot request a timeout which
> fires after {{slot.request.timeout}} to fail the slot request. Moreover, we
> fail the slot request if the {{ResourceManager}} fails to allocate new
> resources or if the slot request times out on the {{ResourceManager}}. This
> kind of behavior makes sense if we know that we need all requested slots so
> that we fail early if it cannot be fulfilled.
> However, for batch jobs it is not strictly required that all slot requests
> get fulfilled. It is enough to have at least one slot for every requested
> {{ResourceProfile}} (the set of slots (available + allocated) must contain a
> slot which can fulfill a slot request). If this is the case, then we should
> not fail the slot request but instead wait until the slot gets assigned to
> the request. If there is no such slot, then we should still time out the
> request.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)