[
https://issues.apache.org/jira/browse/YUNIKORN-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879088#comment-17879088
]
Wilfred Spiegelenburg commented on YUNIKORN-2772:
-------------------------------------------------
This is a multi step issue. We do not communicate a timestamps when we create
an application or an allocation. The issue does not just exist for an
application. The allocations are also involved. Sorting apps on a queue is one
side of the problem but sorting allocations within in application could also be
off.
The k8shim creates the application based on what is considered the oldest pod
it finds (allocated or still pending). That originator pod create time should
set as the application create time. Second point is that each task which
converts into an allocation should have a create time set based on the pod
detail.
These two changes made on the k8shim side need to be communicated into the core
and the create steps should pickup these two new values and not use a new
timestamp. The create time is currently communicated through a tag on the
application as per YUNIKORN-1155 changes to support placeholder timeout fix on
recovery. That tag is always set and could be used to set the create time. The
allocation can follow the same principal.
Starting work on this
> Scheduler restart does not preserve app start time
> --------------------------------------------------
>
> Key: YUNIKORN-2772
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2772
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: shim - kubernetes
> Reporter: Mit Desai
> Assignee: Wilfred Spiegelenburg
> Priority: Critical
>
> Whenever the scheduler is restarted, all the applications create time is set
> to the current time, ignoring the original value that comes from the API
> server.
> Due to this, FIFO sorting can show irregularity in scheduling.
> If there is an App1 that started 2 days ago and App2 that started 1 day ago,
> during scheduler restart, both the apps will get almost same created time
> (nano seconds apart). App2 create time can be just a few nano seconds ahead
> of App1 and hence App2 gets priority over App1
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]