[
https://issues.apache.org/jira/browse/FLINK-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhu Zhu updated FLINK-14314:
----------------------------
Description:
With FLINK-14058, it is assumed that a shared slot should be large enough to be
used by one instance of each JobVertex in the group simultaneously.
To support it, a shared slot resources should be the sum of all JobVertex
resources in the group.
Here's the concrete proposal:
1. Add a physicalSlotResourceProfile in SlotProfile. It should be used for
physical slot allocation. Rename previous ResourceProfile to be
taskResourceProfile for logical slot allocation.
2. SharedSlotOversubscribedException and its handling can be removed, including
part of the children slots releasing and re-allocating. This is because partial
fulfillment should not happen anymore with #1. A simple sanity check can be
kept for oversubscribing.
was:
With FLINK-14058, it is assumed that a shared slot should be large enough to be
used by one instance of each JobVertex in the group simultaneously.
To support it, a shared slot resources should be the sum of all JobVertex
resources in the group.
Here's the concrete proposal:
-1. add a {{ResourceSpec}} for {{SlotSharingGroup}}. Set it as a merge of the
resources of all operators in it when building the {{JobGraph}} in
{{StreamingJobGraphGenerator}}- -> separated to be FLINK-14734
2. remove the resources bookkeeping logic for slot sharing, which was
introduced in FLINK-12765. So that a shared slot will allocate physical slot
regarding the first resourceProfile it receives, and do no more checks for
later arrived slot allocation on it. The next step will guarantee the first
resourceProfile to be enough.
3. change {{ExecutionVertex#getResourceProfile}} to return the
{{SlotSharingGroup#resourceSpec}} if it is in a slot sharing group, otherwise
returns the {{ExecutionJobVertex#resourceProfile}}
> Allocate shared slot resources respecting the resources of all vertices in
> the group
> ------------------------------------------------------------------------------------
>
> Key: FLINK-14314
> URL: https://issues.apache.org/jira/browse/FLINK-14314
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Coordination
> Affects Versions: 1.10.0
> Reporter: Zhu Zhu
> Assignee: Zhu Zhu
> Priority: Major
> Fix For: 1.10.0
>
>
> With FLINK-14058, it is assumed that a shared slot should be large enough to
> be used by one instance of each JobVertex in the group simultaneously.
> To support it, a shared slot resources should be the sum of all JobVertex
> resources in the group.
> Here's the concrete proposal:
> 1. Add a physicalSlotResourceProfile in SlotProfile. It should be used for
> physical slot allocation. Rename previous ResourceProfile to be
> taskResourceProfile for logical slot allocation.
> 2. SharedSlotOversubscribedException and its handling can be removed,
> including part of the children slots releasing and re-allocating. This is
> because partial fulfillment should not happen anymore with #1. A simple
> sanity check can be kept for oversubscribing.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)