[
https://issues.apache.org/jira/browse/FLINK-31080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17717456#comment-17717456
]
Weijie Guo commented on FLINK-31080:
------------------------------------
Thanks for pick-up this, I didn't fix this anytime soon because I found out
that this code path (releasing the idle slot under the adaptive scheduler)
should never have been triggered before, but after FLINK-31399 it should be, so
let's fix it now.
> I think we need to put timestamp maintenance inside the declarativeSlotPool
> to maintain uniform timestamp semantics.
Yes, they should share the same clock Ideally, and we can also pass the clock
to the {{Scheduler}}.
>If you are fine , can i share the patch in flink-runtime ?
Sure, but I'm leaning toward unifying the clocks between {{SlotPoolService}}
and {{Scheduler}}, like Weihua said. What do you think?
> Idle slots are not released due to a mismatch in time between
> DeclarativeSlotPoolService and SlotSharingSlotAllocator
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-31080
> URL: https://issues.apache.org/jira/browse/FLINK-31080
> Project: Flink
> Issue Type: Bug
> Affects Versions: 1.17.0, 1.16.1
> Reporter: Prabhu Joseph
> Assignee: Weijie Guo
> Priority: Major
> Labels: pull-request-available
>
> Due to a timing mismatch between {{DeclarativeSlotPoolService}} and
> {{{}SlotSharingSlotAllocator{}}}, idle slots are not released.
> {{DeclarativeSlotPoolService}} uses {{{}SystemClock#relativeTimeMillis{}}},
> i.e., {{{}System.nanoTime{}}}() / 1_000_000, while offering a slot, whereas
> {{SlotSharingSlotAllocator}} uses {{{}System.currentTimeMillis{}}}() while
> freeing the reserved slot.
> The idle timeout check fails wrongly as "{{{}System.currentTimeMillis(){}}}"
> will have a very high value compared to
> "{{{}SystemClock#relativeTimeMillis{}}}".
--
This message was sent by Atlassian Jira
(v8.20.10#820010)