Peter Bacsko created YUNIKORN-2284:
--------------------------------------

             Summary: ERROR message when stopping Service context
                 Key: YUNIKORN-2284
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2284
             Project: Apache YuniKorn
          Issue Type: Sub-task
          Components: core - scheduler
            Reporter: Peter Bacsko


After YUNIKORN-2233, the scheduler core can be stopped. This causes an issue 
inside the MockScheduler:

{noformat}
2023-12-21T17:58:49.203+0100    INFO    core.scheduler.partition        
scheduler/partition.go:1296     removing allocation from application    
{"appID": "app-1", "allocationId": "alloc-1-0", "terminationType": 
"STOPPED_BY_RM"}
2023-12-21T17:58:49.203+0100    INFO    core.scheduler.ugm      
ugm/manager.go:136      Removing user from manager      {"user": "testuser"}
2023-12-21T17:58:49.203+0100    INFO    core.scheduler.fsm      
objects/application_state.go:147        Application state transition    
{"appID": "app-1", "source": "Starting", "destination": "Completing", "event": 
"completeApplication"}
entrypoint/service_context.go:40        ServiceContext stop all services
2023-12-21T17:58:59.210+0100    INFO    core.rest       
webservice/handlers.go:972      Connection closed for event stream client       
{"host": "localhost:9080"}
2023-12-21T17:58:59.210+0100    INFO    core.events     
events/event_streaming.go:152   Removing event stream consumer  {"name": 
"localhost:9080", "creation time": "2023-12-21T17:58:49.166+0100"}
2023-12-21T17:58:59.211+0100    INFO    core.metrics    
metrics/metrics_collector.go:98 Stopping internal metrics collector
2023-12-21T17:58:59.211+0100    INFO    core.scheduler  
scheduler/scheduler.go:214      Stopping scheduler & background services
2023-12-21T17:58:59.211+0100    INFO    core.scheduler.health   
scheduler/health_checker.go:143 Stopping periodic health checker
2023-12-21T17:58:59.211+0100    INFO    core.scheduler.nodesusage       
scheduler/nodes_usage_monitor.go:72     Stopping node resource usage monitor
2023-12-21T17:58:59.211+0100    INFO    core.scheduler.context  
scheduler/context.go:989        Stopping background services of partitions
2023-12-21T17:58:59.211+0100    INFO    core.scheduler.partition        
scheduler/partition_manager.go:93       Stopping partition manager      
{"partition": "[rm:123]default"}
2023-12-21T17:58:59.211+0100    INFO    core.scheduler.partition        
scheduler/partition_manager.go:144      marking all queues for removal  
{"partitionName": "[rm:123]default"}
2023-12-21T17:58:59.211+0100    INFO    core.scheduler.queue    
objects/queue.go:952    marking managed queue for deletion      {"queue": 
"root"}
2023-12-21T17:58:59.212+0100    INFO    core.scheduler.fsm      
objects/object_state.go:81      object transition       {"object": "root", 
"source": "Active", "destination": "Draining", "event": "Remove"}
2023-12-21T17:58:59.212+0100    INFO    core.scheduler.queue    
objects/queue.go:952    marking managed queue for deletion      {"queue": 
"root.singleleaf"}
2023-12-21T17:58:59.212+0100    INFO    core.scheduler.fsm      
objects/object_state.go:81      object transition       {"object": 
"root.singleleaf", "source": "Active", "destination": "Draining", "event": 
"Remove"}
2023-12-21T17:58:59.212+0100    INFO    core.scheduler.partition        
scheduler/partition_manager.go:150      removing all applications from 
partition        {"numOfApps": 1, "partitionName": "[rm:123]default"}
2023-12-21T17:58:59.212+0100    INFO    core.scheduler.application      
objects/application.go:608      ask removed successfully from application       
{"appID": "app-1", "ask": "", "pendingDelta": "map[memory:0 vcore:0]"}
2023-12-21T17:58:59.212+0100    INFO    core.scheduler.queue    
objects/queue.go:837    Application completed and removed from queue    
{"queueName": "root.singleleaf", "applicationID": "app-1"}
2023-12-21T17:59:32.848+0100    ERROR   core.scheduler.ugm      
ugm/manager.go:118      user tracker must be available in userTrackers map      
{"user": "testuser"}
github.com/apache/yunikorn-core/pkg/scheduler/ugm.(*Manager).DecreaseTrackedResource
        /home/bacskop/repos/yunikorn-core/pkg/scheduler/ugm/manager.go:118
github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).decUserResourceUsage
        
/home/bacskop/repos/yunikorn-core/pkg/scheduler/objects/application.go:1654
github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).RemoveAllAllocations
        
/home/bacskop/repos/yunikorn-core/pkg/scheduler/objects/application.go:1843
github.com/apache/yunikorn-core/pkg/scheduler.(*PartitionContext).removeApplication
        /home/bacskop/repos/yunikorn-core/pkg/scheduler/partition.go:388
github.com/apache/yunikorn-core/pkg/scheduler.(*partitionManager).remove
        /home/bacskop/repos/yunikorn-core/pkg/scheduler/partition_manager.go:156
github.com/apache/yunikorn-core/pkg/scheduler.(*partitionManager).Stop
        /home/bacskop/repos/yunikorn-core/pkg/scheduler/partition_manager.go:97
github.com/apache/yunikorn-core/pkg/scheduler.(*ClusterContext).Stop
        /home/bacskop/repos/yunikorn-core/pkg/scheduler/context.go:991
github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).Stop
        /home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:217
github.com/apache/yunikorn-core/pkg/entrypoint.(*ServiceContext).StopAll
        /home/bacskop/repos/yunikorn-core/pkg/entrypoint/service_context.go:50
github.com/apache/yunikorn-core/pkg/scheduler/tests.(*mockScheduler).Stop
        
/home/bacskop/repos/yunikorn-core/pkg/scheduler/tests/mockscheduler_test.go:91
github.com/apache/yunikorn-core/pkg/scheduler/tests.TestApplicationHistoryTracking
        
/home/bacskop/repos/yunikorn-core/pkg/scheduler/tests/application_tracking_test.go:172
{noformat}

The problem is that the tracker object no longer exist when 
{{PartitionContext.removeApplication()}} is called. At the point the app is 
also in Completed state, so it's not necessary to decrement any resource.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to