[ 
https://issues.apache.org/jira/browse/FLINK-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-14314:
----------------------------
    Description: 
With FLINK-14058, it is assumed that a shared slot should be large enough to be 
used by one instance of each JobVertex in the group simultaneously.

To support it, a shared slot resources should be the sum of all JobVertex 
resources in the group.

Here's the concrete proposal:
1. Add a physicalSlotResourceProfile in SlotProfile. It should be used for 
physical slot allocation. Rename previous ResourceProfile to be 
taskResourceProfile for logical slot allocation.
2. SharedSlotOversubscribedException and its handling can be removed, including 
part of the children slots releasing and re-allocating. This is because partial 
fulfillment should not happen anymore with #1. A simple sanity check can be 
kept for oversubscribing.


  was:
With FLINK-14058, it is assumed that a shared slot should be large enough to be 
used by one instance of each JobVertex in the group simultaneously.

To support it, a shared slot resources should be the sum of all JobVertex 
resources in the group.

Here's the concrete proposal:
-1. add a {{ResourceSpec}} for {{SlotSharingGroup}}. Set it as a merge of the 
resources of all operators in it when building the {{JobGraph}} in 
{{StreamingJobGraphGenerator}}- -> separated to be FLINK-14734
2. remove the resources bookkeeping logic for slot sharing, which was 
introduced in FLINK-12765. So that a shared slot will allocate physical slot 
regarding the first resourceProfile it receives, and do no more checks for 
later arrived slot allocation on it. The next step will guarantee the first 
resourceProfile to be enough.
3. change {{ExecutionVertex#getResourceProfile}} to return the 
{{SlotSharingGroup#resourceSpec}} if it is in a slot sharing group, otherwise 
returns the {{ExecutionJobVertex#resourceProfile}}



> Allocate shared slot resources respecting the resources of all vertices in 
> the group
> ------------------------------------------------------------------------------------
>
>                 Key: FLINK-14314
>                 URL: https://issues.apache.org/jira/browse/FLINK-14314
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>    Affects Versions: 1.10.0
>            Reporter: Zhu Zhu
>            Assignee: Zhu Zhu
>            Priority: Major
>             Fix For: 1.10.0
>
>
> With FLINK-14058, it is assumed that a shared slot should be large enough to 
> be used by one instance of each JobVertex in the group simultaneously.
> To support it, a shared slot resources should be the sum of all JobVertex 
> resources in the group.
> Here's the concrete proposal:
> 1. Add a physicalSlotResourceProfile in SlotProfile. It should be used for 
> physical slot allocation. Rename previous ResourceProfile to be 
> taskResourceProfile for logical slot allocation.
> 2. SharedSlotOversubscribedException and its handling can be removed, 
> including part of the children slots releasing and re-allocating. This is 
> because partial fulfillment should not happen anymore with #1. A simple 
> sanity check can be kept for oversubscribing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to