[ 
https://issues.apache.org/jira/browse/YUNIKORN-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YUNIKORN-1795:
-----------------------------------
    Description: 
While debuggin/profiling YUNIKIORN-1724, the necessity of 
{{Application.schedule()}} has come into question. By default, this method is 
called in every second, to schedule tasks:
 * if the Application state is Reserving, we only schedule placeholders
 * If it is Running, then we schedule everything
 * Two other states are checked (New / Accepted)

After analyzing the code, some states in {{Application}} might be removed, 
simplifying the design. We can do this for the following reasons:
 # The task must not be sent to the core immediately, because we have to wait 
for the application to be accepted. Until that happens, we just keep collecting 
{{Task}} objects.
 # As soon as an accept comes back, we just go through all tasks that we have 
and schedule them. We can mark a boolean flag (eg. {{{}Application.accepted = 
true{}}}), so that new incoming tasks can be scheduled immediately.
 # In order to send the request to the core, we just need another extra call 
after we created the {{{}Task{}}}. There's no need for the {{InitTask}} state.
 # Inside {{{}Task.handleSubmitTaskEvent(){}}}, we already have the 
{{Application}} object and know if it's a gang job or not. If it is and the 
task is not a placeholder, we simply wait for placeholder allocations and don't 
submit the task.
 # If a placeholder is allocated, we can always check if it is time for the 
normal pods to start running in {{{}postTaskBound(){}}}. We don't need an extra 
application event for that.

With this in mind, I suggest removing {{Application.Schedule()}} method and the 
{{InitTask}} state from {{{}Task{}}}. This makes the code simpler to understand 
and probably even faster.

  was:
While debuggin/profiling YUNIKIORN-1724, the necessity of 
{{Application.schedule()}} has come into question. By default, this method is 
called in every second, to schedule tasks:
 * if the Application state is Reserving, we only schedule placeholders
 * If it is Running, then we schedule everything
 * Two other states are checked (New / Accepted)

After analyzing the code, the entire state machine inside {{Application}} might 
be removed, greatly simplifying the design. We can do this for the following 
reasons:
 # The task must not be sent to the core immediately, because we have to wait 
for the application to be accepted. Until that happens, we just keep collecting 
{{Task}} objects.
 # As soon as an accept comes back, we just go through all tasks that we have 
and schedule them. We can mark a boolean flag (eg. {{{}Application.accepted = 
true{}}}), so that new incoming tasks can be scheduled immediately.
 # In order to send the request to the core, we just need another extra call 
after we created the {{{}Task{}}}. There's no need for the {{InitTask}} state.
 # Inside {{{}Task.handleSubmitTaskEvent(){}}}, we already have the 
{{Application}} object and know if it's a gang job or not. If it is and the 
task is not a placeholder, we simply wait for placeholder allocations and don't 
submit the task.
 # If a placeholder is allocated, we can always check if it is time for the 
normal pods to start running in {{{}postTaskBound(){}}}. We don't need an extra 
application event for that.

With this in mind, I suggest removing {{Application.Schedule()}} method and the 
{{InitTask}} state from {{{}Task{}}}. This makes the code simpler to understand 
and probably even faster.


> Remove the scheduling cycle in the shim
> ---------------------------------------
>
>                 Key: YUNIKORN-1795
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-1795
>             Project: Apache YuniKorn
>          Issue Type: Improvement
>          Components: shim - kubernetes
>            Reporter: Peter Bacsko
>            Priority: Major
>
> While debuggin/profiling YUNIKIORN-1724, the necessity of 
> {{Application.schedule()}} has come into question. By default, this method is 
> called in every second, to schedule tasks:
>  * if the Application state is Reserving, we only schedule placeholders
>  * If it is Running, then we schedule everything
>  * Two other states are checked (New / Accepted)
> After analyzing the code, some states in {{Application}} might be removed, 
> simplifying the design. We can do this for the following reasons:
>  # The task must not be sent to the core immediately, because we have to wait 
> for the application to be accepted. Until that happens, we just keep 
> collecting {{Task}} objects.
>  # As soon as an accept comes back, we just go through all tasks that we have 
> and schedule them. We can mark a boolean flag (eg. {{{}Application.accepted = 
> true{}}}), so that new incoming tasks can be scheduled immediately.
>  # In order to send the request to the core, we just need another extra call 
> after we created the {{{}Task{}}}. There's no need for the {{InitTask}} state.
>  # Inside {{{}Task.handleSubmitTaskEvent(){}}}, we already have the 
> {{Application}} object and know if it's a gang job or not. If it is and the 
> task is not a placeholder, we simply wait for placeholder allocations and 
> don't submit the task.
>  # If a placeholder is allocated, we can always check if it is time for the 
> normal pods to start running in {{{}postTaskBound(){}}}. We don't need an 
> extra application event for that.
> With this in mind, I suggest removing {{Application.Schedule()}} method and 
> the {{InitTask}} state from {{{}Task{}}}. This makes the code simpler to 
> understand and probably even faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to