[ 
https://issues.apache.org/jira/browse/FLINK-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8886:
------------------------------------
    Component/s: Local Runtime
                 Distributed Coordination

> Job isolation via scheduling in shared cluster
> ----------------------------------------------
>
>                 Key: FLINK-8886
>                 URL: https://issues.apache.org/jira/browse/FLINK-8886
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Coordination, Local Runtime, Scheduler
>    Affects Versions: 1.5.0
>            Reporter: Elias Levy
>            Priority: Major
>
> Flink's TaskManager executes tasks from different jobs within the same JMV as 
> threads.  We prefer to isolate different jobs on their on JVM.  Thus, we must 
> use different TMs for different jobs.  As currently the scheduler will 
> allocate task slots within a TM to tasks from different jobs, that means we 
> must stand up one cluster per job.  This is wasteful, as it requires at least 
> two JobManagers per cluster for high-availability, and the JMs have low 
> utilization.
> Additionally, different jobs may require different resources.  Some jobs are 
> compute heavy.  Some are IO heavy (lots of state in RocksDB).  At the moment 
> the scheduler threats all TMs are equivalent, except possibly in their number 
> of available task slots.  Thus, one is required to stand up multiple cluster 
> if there is a need for different types of TMs.
>  
> It would be useful if one could specify requirements on job, such that they 
> are only scheduled on a subset of TMs.  Properly configured, that would 
> permit isolation of jobs in a shared cluster and scheduling of jobs with 
> specific resource needs.
>  
> One possible implementation is to specify a set of tags on the TM config file 
> which the TMs used when registering with the JM, and another set of tags 
> configured within the job or supplied when submitting the job.  The scheduler 
> could then match the tags in the job with the tags in the TMs.  In a 
> restrictive mode the scheduler would assign a job task to a TM only if all 
> tags match.  In a relaxed mode the scheduler could assign a job task to a TM 
> if there is a partial match, while giving preference to a more accurate match.
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to