[
https://issues.apache.org/jira/browse/FLINK-31080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17717439#comment-17717439
]
Weihua Hu commented on FLINK-31080:
-----------------------------------
IIUC, currently declarativeSlotPool is used by two components (SlotPoolService,
Scheduler). Slots are offered (with timestamp) by SlotPoolService. Used,
released in Scheduler (update timestamp). This requires all components using
declarativeSlotPool to have aligned timestamp semantics, but we don't have a
mechanism to ensure this.
So I think we need to put timestamp maintenance inside the declarativeSlotPool
to maintain uniform timestamp semantics.
WDYT, [~Weijie Guo] [~prabhujoseph]
> Idle slots are not released due to a mismatch in time between
> DeclarativeSlotPoolService and SlotSharingSlotAllocator
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-31080
> URL: https://issues.apache.org/jira/browse/FLINK-31080
> Project: Flink
> Issue Type: Bug
> Affects Versions: 1.17.0, 1.16.1
> Reporter: Prabhu Joseph
> Assignee: Weijie Guo
> Priority: Major
> Labels: pull-request-available
>
> Due to a timing mismatch between {{DeclarativeSlotPoolService}} and
> {{{}SlotSharingSlotAllocator{}}}, idle slots are not released.
> {{DeclarativeSlotPoolService}} uses {{{}SystemClock#relativeTimeMillis{}}},
> i.e., {{{}System.nanoTime{}}}() / 1_000_000, while offering a slot, whereas
> {{SlotSharingSlotAllocator}} uses {{{}System.currentTimeMillis{}}}() while
> freeing the reserved slot.
> The idle timeout check fails wrongly as "{{{}System.currentTimeMillis(){}}}"
> will have a very high value compared to
> "{{{}SystemClock#relativeTimeMillis{}}}".
--
This message was sent by Atlassian Jira
(v8.20.10#820010)