[
https://issues.apache.org/jira/browse/FLINK-15224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhu Zhu updated FLINK-15224:
----------------------------
Description:
In {{SchedulerImpl#allocateMultiTaskSlot}}, if a slot request cannot be
fulfilled immediately with a resolved root slot(MultiTaskSlot that is fulfilled
by an allocated slot) or with available slots, it will be assigned to a random
unresolved root slot. It does not do resource requirements check in this case,
so a large task slot can be assigned to a small shared slot (unresolved root
slot) and when the shared slot received its physical slot offer, it will be
recognized as oversubscribing and the slot would be released and related tasks
would fail.
It's not a problem for now since specified resources are not used yet, but can
be a problem in the future when we are to support specified resources.
was:
In {{SchedulerImpl#allocateMultiTaskSlot}}, if a slot request cannot be
fulfilled immediately with a resolved root slot(MultiTaskSlot that is fulfilled
by an allocated slot) or with available slots, it will be assigned to a random
unresolved root slot. It does not do resource requirements check in this case,
so a large task slot can be assigned to a small shared slot (unresolved root
slot) and when the shared slot received its physical slot offer, it will be
recognized as oversubscribing and the slot would be released and related tasks
would fail.
It's not a problem for now since specified resources are not used yet, but can
be a problem in the future if we'd like to support specified resources.
> Resource requirements are not respected when fulfilling a slot request with
> unresolvedRootSlots from a SlotSharingManager
> -------------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-15224
> URL: https://issues.apache.org/jira/browse/FLINK-15224
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.11.0
> Reporter: Zhu Zhu
> Priority: Major
>
> In {{SchedulerImpl#allocateMultiTaskSlot}}, if a slot request cannot be
> fulfilled immediately with a resolved root slot(MultiTaskSlot that is
> fulfilled by an allocated slot) or with available slots, it will be assigned
> to a random unresolved root slot. It does not do resource requirements check
> in this case, so a large task slot can be assigned to a small shared slot
> (unresolved root slot) and when the shared slot received its physical slot
> offer, it will be recognized as oversubscribing and the slot would be
> released and related tasks would fail.
> It's not a problem for now since specified resources are not used yet, but
> can be a problem in the future when we are to support specified resources.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)