[
https://issues.apache.org/jira/browse/FLINK-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658957#comment-16658957
]
Tony Xintong Song commented on FLINK-10640:
-------------------------------------------
Thank you [~till.rohrmann]. I'll start drafting a design doc.
> Enable Slot Resource Profile for Resource Management
> ----------------------------------------------------
>
> Key: FLINK-10640
> URL: https://issues.apache.org/jira/browse/FLINK-10640
> Project: Flink
> Issue Type: New Feature
> Components: ResourceManager
> Reporter: Tony Xintong Song
> Priority: Major
>
> Motivation & Backgrounds
> * The existing concept of task slots roughly represents how many pipeline of
> tasks a TaskManager can hold. However, it does not consider the differences
> in resource needs and usage of individual tasks. Enabling resource profiles
> of slots may allow Flink to better allocate execution resources according to
> tasks fine-grained resource needs.
> * The community version Flink already contains APIs and some implementation
> for slot resource profile. However, such logic is not truly used.
> (ResourceProfile of slot requests is by default set to UNKNOWN with negative
> values, thus matches any given slot.)
> Preliminary Design
> * Slot Management
> A slot represents a certain amount of resources for a single pipeline of
> tasks to run in on a TaskManager. Initially, a TaskManager does not have any
> slots but a total amount of resources. When allocating, the ResourceManager
> finds proper TMs to generate new slots for the tasks to run according to the
> slot requests. Once generated, the slot's size (resource profile) does not
> change until it's freed. ResourceManager can apply different, portable
> strategies to allocate slots from TaskManagers.
> * TM Management
> The size and number of TaskManagers and when to start them can also be
> flexible. TMs can be started and released dynamically, and may have different
> sizes. We may have many different, portable strategies. E.g., an elastic
> session that can run multiple jobs like the session mode while dynamically
> adjusting the size of session (number of TMs) according to the realtime
> working load.
> * About Slot Sharing
> Slot sharing is a good heuristic to easily calculate how many slots needed
> to get the job running and get better utilization when there is no resource
> profile in slots. However, with resource profiles enabling finer-grained
> resource management, each individual task has its specific resource need and
> it does not make much sense to have multiple tasks sharing the resource of
> the same slot. Instead, we may introduce locality preferences/constraints to
> support the semantics of putting tasks in same/different TMs in a more
> general way.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)