Peter Bacsko created YUNIKORN-2599:
--------------------------------------

             Summary: AppStateChange/AppTaskCompleted event cannot be handled 
in many states
                 Key: YUNIKORN-2599
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2599
             Project: Apache YuniKorn
          Issue Type: Bug
          Components: shim - yarn
            Reporter: Peter Bacsko


After YUNIKORN-2597 got merged, it became clear that we keep sending an 
{{AppStateChange}} event which cannot be handled by the state machine. There 
isn't any state in the FSM object which would actually be able to process this 
event.

{{AppTaskCompleted}} is very similar, it is only processed in {{Resuming}} 
state.

If someone runs the test case TestApplicationScheduling, the following errors 
are displayed:
{noformat}
[...]
2024-05-02T18:08:14.856+0200    ERROR   shim.context    cache/context.go:1316   
application event cannot be handled in the current state        
{"applicationID": "app0001", "event": "AppStateChange", "state": "Running"}
github.com/apache/yunikorn-k8shim/pkg/shim.newShimSchedulerInternal.(*Context).ApplicationEventHandler.func1
        /home/bacskop/repos/yunikorn-k8shim/pkg/cache/context.go:1316
github.com/apache/yunikorn-k8shim/pkg/dispatcher.getEventHandler.func1
        /home/bacskop/repos/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:123
github.com/apache/yunikorn-k8shim/pkg/dispatcher.Start.func1
        /home/bacskop/repos/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:225
2024-05-02T18:08:14.856+0200    INFO    core.scheduler.application      
[...] 
2024-05-02T18:08:14.857+0200    INFO    core.scheduler.partition        
scheduler/partition.go:928      scheduler allocation processed  {"appID": 
"app0001", "allocationKey": "task0002", "allocatedResource": 
"map[memory:10000000 pods:1 vcore:1]", "placeholder": false, "targetNode": 
"test.host.02"}
2024-05-02T18:08:14.857+0200    ERROR   shim.context    cache/context.go:1316   
application event cannot be handled in the current state        
{"applicationID": "app0001", "event": "AppStateChange", "state": "Running"}
github.com/apache/yunikorn-k8shim/pkg/shim.newShimSchedulerInternal.(*Context).ApplicationEventHandler.func1
        /home/bacskop/repos/yunikorn-k8shim/pkg/cache/context.go:1316
github.com/apache/yunikorn-k8shim/pkg/dispatcher.getEventHandler.func1
        /home/bacskop/repos/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:123
github.com/apache/yunikorn-k8shim/pkg/dispatcher.Start.func1
        /home/bacskop/repos/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:225
[...]
2024-05-02T18:08:15.856+0200    INFO    shim.fsm        cache/task_state.go:380 
Task state transition   {"app": "app0001", "task": "task0001", "taskAlias": 
"default/task0001", "source": "Bound", "destination": "Completed", "event": 
"CompleteTask"}
2024-05-02T18:08:15.856+0200    ERROR   shim.context    cache/context.go:1316   
application event cannot be handled in the current state        
{"applicationID": "app0001", "event": "AppTaskCompleted", "state": "Running"}
github.com/apache/yunikorn-k8shim/pkg/shim.newShimSchedulerInternal.(*Context).ApplicationEventHandler.func1
        /home/bacskop/repos/yunikorn-k8shim/pkg/cache/context.go:1316
github.com/apache/yunikorn-k8shim/pkg/dispatcher.getEventHandler.func1
        /home/bacskop/repos/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:123
github.com/apache/yunikorn-k8shim/pkg/dispatcher.Start.func1
        /home/bacskop/repos/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:225
[...]
2024-05-02T18:08:16.858+0200    INFO    shim.fsm        cache/task_state.go:380 
Task state transition   {"app": "app0001", "task": "task0002", "taskAlias": 
"default/task0002", "source": "Bound", "destination": "Completed", "event": 
"CompleteTask"}
2024-05-02T18:08:16.858+0200    ERROR   shim.context    cache/context.go:1316   
application event cannot be handled in the current state        
{"applicationID": "app0001", "event": "AppTaskCompleted", "state": "Running"}
github.com/apache/yunikorn-k8shim/pkg/shim.newShimSchedulerInternal.(*Context).ApplicationEventHandler.func1
        /home/bacskop/repos/yunikorn-k8shim/pkg/cache/context.go:1316
github.com/apache/yunikorn-k8shim/pkg/dispatcher.getEventHandler.func1
        /home/bacskop/repos/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:123
github.com/apache/yunikorn-k8shim/pkg/dispatcher.Start.func1
        /home/bacskop/repos/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:225
[...]
2024-05-02T18:08:16.859+0200    ERROR   shim.context    cache/context.go:1316   
application event cannot be handled in the current state        
{"applicationID": "app0001", "event": "AppStateChange", "state": "Running"}
github.com/apache/yunikorn-k8shim/pkg/shim.newShimSchedulerInternal.(*Context).ApplicationEventHandler.func1
        /home/bacskop/repos/yunikorn-k8shim/pkg/cache/context.go:1316
github.com/apache/yunikorn-k8shim/pkg/dispatcher.getEventHandler.func1
        /home/bacskop/repos/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:123
github.com/apache/yunikorn-k8shim/pkg/dispatcher.Start.func1
        /home/bacskop/repos/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:225
2024-05-02T18:08:17.859+0200    INFO    shim.cache.application  
cache/application.go:243        task removed    {"appID": "app0001", "taskID": 
"task0002"}
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to