[
https://issues.apache.org/jira/browse/FLINK-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658946#comment-16658946
]
Till Rohrmann commented on FLINK-10640:
---------------------------------------
Thanks for opening this issuse [~wuzang]. I agree with [~Tison] that such a new
feature needs a bit more in depth discussion. I would suggest to start creating
a design document describing in more detail the intended changes.
> Enable Slot Resource Profile for Resource Management
> ----------------------------------------------------
>
> Key: FLINK-10640
> URL: https://issues.apache.org/jira/browse/FLINK-10640
> Project: Flink
> Issue Type: New Feature
> Components: ResourceManager
> Reporter: Tony Xintong Song
> Priority: Major
>
> Motivation & Backgrounds
> * The existing concept of task slots roughly represents how many pipeline of
> tasks a TaskManager can hold. However, it does not consider the differences
> in resource needs and usage of individual tasks. Enabling resource profiles
> of slots may allow Flink to better allocate execution resources according to
> tasks fine-grained resource needs.
> * The community version Flink already contains APIs and some implementation
> for slot resource profile. However, such logic is not truly used.
> (ResourceProfile of slot requests is by default set to UNKNOWN with negative
> values, thus matches any given slot.)
> Preliminary Design
> * Slot Management
> A slot represents a certain amount of resources for a single pipeline of
> tasks to run in on a TaskManager. Initially, a TaskManager does not have any
> slots but a total amount of resources. When allocating, the ResourceManager
> finds proper TMs to generate new slots for the tasks to run according to the
> slot requests. Once generated, the slot's size (resource profile) does not
> change until it's freed. ResourceManager can apply different, portable
> strategies to allocate slots from TaskManagers.
> * TM Management
> The size and number of TaskManagers and when to start them can also be
> flexible. TMs can be started and released dynamically, and may have different
> sizes. We may have many different, portable strategies. E.g., an elastic
> session that can run multiple jobs like the session mode while dynamically
> adjusting the size of session (number of TMs) according to the realtime
> working load.
> * About Slot Sharing
> Slot sharing is a good heuristic to easily calculate how many slots needed
> to get the job running and get better utilization when there is no resource
> profile in slots. However, with resource profiles enabling finer-grained
> resource management, each individual task has its specific resource need and
> it does not make much sense to have multiple tasks sharing the resource of
> the same slot. Instead, we may introduce locality preferences/constraints to
> support the semantics of putting tasks in same/different TMs in a more
> general way.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)