[ 
https://issues.apache.org/jira/browse/FLINK-24005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17406770#comment-17406770
 ] 

Till Rohrmann edited comment on FLINK-24005 at 8/30/21, 2:59 PM:
-----------------------------------------------------------------

I am not sure whether making the {{DefaultDeclarativeSlotPool#releaseSlots}} 
smarter is the right solution here. The problem I see is that it moves 
responsibilities of the {{DeclarativeSlotPoolBridge}} into the 
{{DefaultDeclarativeSlotPool}}. W/o the bridge this specific functionality 
wouldn't be needed in the {{DefaultDeclarativeSlotPool}}. Instead, the whole 
logic to increment/decrement the resource requirements when 
allocation/releasing slots should live imo in the {{DeclarativeSlotPoolBridge}}.

I do see the problem caused by 
{{DefaultDeclarativeSlotPool#adjustRequirements}}. I think it can work if we 
return the actual {{ResourceProfile}} from the 
{{DeclarativeSlotPool.reserveFreeSlot}} method.

On a related note, isn't {{DefaultDeclarativeSlotPool#adjustRequirements}} only 
required by the default scheduler? Maybe this is something that should actually 
live in the {{DeclarativeSlotPoolBridge}}, too, because the 
{{AdaptiveScheduler}} does not need this remapping, if I am not mistaken.

Maybe we can make the {{releaseSlots}} smarter as a band aid for {{1.14.0}} and 
then fix the problem properly with {{1.14.1}} and {{1.15.0}}.


was (Author: till.rohrmann):
I am not sure whether making the {{DefaultDeclarativeSlotPool#releaseSlots}} 
smarter is the right solution here. The problem I see is that it moves 
responsibilities of the {{DeclarativeSlotPoolBridge}} into the 
{{DefaultDeclarativeSlotPool}}. W/o the bridge this specific functionality 
wouldn't be needed in the {{DefaultDeclarativeSlotPool}}. Instead, the whole 
logic to increment/decrement the resource requirements when 
allocation/releasing slots should live imo in the {{DeclarativeSlotPoolBridge}}.

I do see the problem caused by 
{{DefaultDeclarativeSlotPool#adjustRequirements}}. I think it can work if we 
return the actual {{ResourceProfile}} from the 
{{DeclarativeSlotPool.reserveFreeSlot}} method.

On a related note, isn't {{DefaultDeclarativeSlotPool#adjustRequirements}} only 
required by the default scheduler? Maybe this is something that should actually 
live in the {{DeclarativeSlotPoolBridge}}, too, because the 
{{AdaptiveScheduler}} does not need this remapping, if I am not mistaken.

> Resource requirements declaration may be incorrect if JobMaster disconnects 
> with a TaskManager with available slots in the SlotPool
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-24005
>                 URL: https://issues.apache.org/jira/browse/FLINK-24005
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.14.0, 1.12.5, 1.13.2
>            Reporter: Zhu Zhu
>            Assignee: Chesnay Schepler
>            Priority: Critical
>             Fix For: 1.14.0, 1.12.6, 1.13.3
>
>         Attachments: decrease_resource_requirements.log
>
>
> When a TaskManager disconnects with JobMaster, it will trigger the 
> `DeclarativeSlotPoolService#decreaseResourceRequirementsBy()` for all the 
> slots that are registered to the JobMaster from the TaskManager. If the slots 
> are still available, i.e. not assigned to any task,  the 
> `decreaseResourceRequirementsBy` may lead to incorrect resource requirements 
> declaration.
> For example, there is one job with 3 source tasks only. It requires 3 slots 
> and declares for 3 slots. Initially all the tasks are running. Suddenly one 
> task failed and waits for some delay before restarting. The previous slot is 
> returned to the SlotPool. Now the job requires 2 slots and declares for 2 
> slots. At this moment, the TaskManager of that returned slot get lost. After 
> the triggered `decreaseResourceRequirementsBy`, the job only declares for 1 
> slot. Finally, when the failed task starts to re-schedule, the job will 
> declare for 2 slots while it actually needs 3 slots.
> The attached log of a real job and logs of the added test in 
> https://github.com/zhuzhurk/flink/commit/59ca0ac5fa9c77b97c6e8a43dcc53ca8a0ad6c37
>  can demonstrate this case.
> Note that the real job is configured with a large 
> "restart-strategy.fixed-delay.delay" and and large "slot.idle.timeout". So 
> possibly in production it is a rare case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to