[
https://issues.apache.org/jira/browse/FLINK-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16406052#comment-16406052
]
Till Rohrmann commented on FLINK-8886:
--------------------------------------
I think this is a good idea to achieve a lightweight resource isolation. There
is actually some work ongoing to be able to specify resource constraints for an
operator (see FLINK-5132 and FLINK-4373). Based on the resource requirements
the RM will pick a suitable slot which offers enough resources. What's missing
is the proper wiring to pass the {{ResourceSpec}} to the right places to
evaluate against the {{ResourceProfile}}.
I think it should be fairly simple to add a set of tags for the TMs and the CLI
when submitting a job.
> Job isolation via scheduling in shared cluster
> ----------------------------------------------
>
> Key: FLINK-8886
> URL: https://issues.apache.org/jira/browse/FLINK-8886
> Project: Flink
> Issue Type: Improvement
> Components: Distributed Coordination, Local Runtime, Scheduler
> Affects Versions: 1.5.0
> Reporter: Elias Levy
> Priority: Major
>
> Flink's TaskManager executes tasks from different jobs within the same JMV as
> threads. We prefer to isolate different jobs on their on JVM. Thus, we must
> use different TMs for different jobs. As currently the scheduler will
> allocate task slots within a TM to tasks from different jobs, that means we
> must stand up one cluster per job. This is wasteful, as it requires at least
> two JobManagers per cluster for high-availability, and the JMs have low
> utilization.
> Additionally, different jobs may require different resources. Some jobs are
> compute heavy. Some are IO heavy (lots of state in RocksDB). At the moment
> the scheduler threats all TMs are equivalent, except possibly in their number
> of available task slots. Thus, one is required to stand up multiple cluster
> if there is a need for different types of TMs.
>
> It would be useful if one could specify requirements on job, such that they
> are only scheduled on a subset of TMs. Properly configured, that would
> permit isolation of jobs in a shared cluster and scheduling of jobs with
> specific resource needs.
>
> One possible implementation is to specify a set of tags on the TM config file
> which the TMs used when registering with the JM, and another set of tags
> configured within the job or supplied when submitting the job. The scheduler
> could then match the tags in the job with the tags in the TMs. In a
> restrictive mode the scheduler would assign a job task to a TM only if all
> tags match. In a relaxed mode the scheduler could assign a job task to a TM
> if there is a partial match, while giving preference to a more accurate match.
>
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)