[jira] [Commented] (FLINK-13163) Support execution of batch jobs with fewer slots than requested

Ken Krugler (JIRA) Wed, 10 Jul 2019 05:55:39 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-13163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16882022#comment-16882022
 ]


Ken Krugler commented on FLINK-13163:
-------------------------------------

Hi [~zhuzh] - thanks for this report, and the notes. I've found that in my 
batch jobs, limiting source parallelism seems to help reduce the number of 
failures. Is there a way to determine (via logs) whether my issue(s) are 
related?

> Support execution of batch jobs with fewer slots than requested
> ---------------------------------------------------------------
>
>                 Key: FLINK-13163
>                 URL: https://issues.apache.org/jira/browse/FLINK-13163
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>    Affects Versions: 1.9.0
>            Reporter: Jeff Zhang
>            Assignee: Till Rohrmann
>            Priority: Major
>             Fix For: 1.9.0
>
>
> Flink should be able to execute batch jobs with fewer slots than requested in 
> a sequential manner.
> At the moment, however, we register for every slot request a timeout which 
> fires after {{slot.request.timeout}} to fail the slot request. Moreover, we 
> fail the slot request if the {{ResourceManager}} fails to allocate new 
> resources or if the slot request times out on the {{ResourceManager}}. This 
> kind of behavior makes sense if we know that we need all requested slots so 
> that we fail early if it cannot be fulfilled.
> However, for batch jobs it is not strictly required that all slot requests 
> get fulfilled. It is enough to have at least one slot for every requested 
> {{ResourceProfile}} (the set of slots (available + allocated) must contain a 
> slot which can fulfill a slot request). If this is the case, then we should 
> not fail the slot request but instead wait until the slot gets assigned to 
> the request. If there is no such slot, then we should still time out the 
> request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-13163) Support execution of batch jobs with fewer slots than requested

Reply via email to