Peter Bacsko created YUNIKORN-2284:
--------------------------------------
Summary: ERROR message when stopping Service context
Key: YUNIKORN-2284
URL: https://issues.apache.org/jira/browse/YUNIKORN-2284
Project: Apache YuniKorn
Issue Type: Sub-task
Components: core - scheduler
Reporter: Peter Bacsko
After YUNIKORN-2233, the scheduler core can be stopped. This causes an issue
inside the MockScheduler:
{noformat}
2023-12-21T17:58:49.203+0100 INFO core.scheduler.partition
scheduler/partition.go:1296 removing allocation from application
{"appID": "app-1", "allocationId": "alloc-1-0", "terminationType":
"STOPPED_BY_RM"}
2023-12-21T17:58:49.203+0100 INFO core.scheduler.ugm
ugm/manager.go:136 Removing user from manager {"user": "testuser"}
2023-12-21T17:58:49.203+0100 INFO core.scheduler.fsm
objects/application_state.go:147 Application state transition
{"appID": "app-1", "source": "Starting", "destination": "Completing", "event":
"completeApplication"}
entrypoint/service_context.go:40 ServiceContext stop all services
2023-12-21T17:58:59.210+0100 INFO core.rest
webservice/handlers.go:972 Connection closed for event stream client
{"host": "localhost:9080"}
2023-12-21T17:58:59.210+0100 INFO core.events
events/event_streaming.go:152 Removing event stream consumer {"name":
"localhost:9080", "creation time": "2023-12-21T17:58:49.166+0100"}
2023-12-21T17:58:59.211+0100 INFO core.metrics
metrics/metrics_collector.go:98 Stopping internal metrics collector
2023-12-21T17:58:59.211+0100 INFO core.scheduler
scheduler/scheduler.go:214 Stopping scheduler & background services
2023-12-21T17:58:59.211+0100 INFO core.scheduler.health
scheduler/health_checker.go:143 Stopping periodic health checker
2023-12-21T17:58:59.211+0100 INFO core.scheduler.nodesusage
scheduler/nodes_usage_monitor.go:72 Stopping node resource usage monitor
2023-12-21T17:58:59.211+0100 INFO core.scheduler.context
scheduler/context.go:989 Stopping background services of partitions
2023-12-21T17:58:59.211+0100 INFO core.scheduler.partition
scheduler/partition_manager.go:93 Stopping partition manager
{"partition": "[rm:123]default"}
2023-12-21T17:58:59.211+0100 INFO core.scheduler.partition
scheduler/partition_manager.go:144 marking all queues for removal
{"partitionName": "[rm:123]default"}
2023-12-21T17:58:59.211+0100 INFO core.scheduler.queue
objects/queue.go:952 marking managed queue for deletion {"queue":
"root"}
2023-12-21T17:58:59.212+0100 INFO core.scheduler.fsm
objects/object_state.go:81 object transition {"object": "root",
"source": "Active", "destination": "Draining", "event": "Remove"}
2023-12-21T17:58:59.212+0100 INFO core.scheduler.queue
objects/queue.go:952 marking managed queue for deletion {"queue":
"root.singleleaf"}
2023-12-21T17:58:59.212+0100 INFO core.scheduler.fsm
objects/object_state.go:81 object transition {"object":
"root.singleleaf", "source": "Active", "destination": "Draining", "event":
"Remove"}
2023-12-21T17:58:59.212+0100 INFO core.scheduler.partition
scheduler/partition_manager.go:150 removing all applications from
partition {"numOfApps": 1, "partitionName": "[rm:123]default"}
2023-12-21T17:58:59.212+0100 INFO core.scheduler.application
objects/application.go:608 ask removed successfully from application
{"appID": "app-1", "ask": "", "pendingDelta": "map[memory:0 vcore:0]"}
2023-12-21T17:58:59.212+0100 INFO core.scheduler.queue
objects/queue.go:837 Application completed and removed from queue
{"queueName": "root.singleleaf", "applicationID": "app-1"}
2023-12-21T17:59:32.848+0100 ERROR core.scheduler.ugm
ugm/manager.go:118 user tracker must be available in userTrackers map
{"user": "testuser"}
github.com/apache/yunikorn-core/pkg/scheduler/ugm.(*Manager).DecreaseTrackedResource
/home/bacskop/repos/yunikorn-core/pkg/scheduler/ugm/manager.go:118
github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).decUserResourceUsage
/home/bacskop/repos/yunikorn-core/pkg/scheduler/objects/application.go:1654
github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).RemoveAllAllocations
/home/bacskop/repos/yunikorn-core/pkg/scheduler/objects/application.go:1843
github.com/apache/yunikorn-core/pkg/scheduler.(*PartitionContext).removeApplication
/home/bacskop/repos/yunikorn-core/pkg/scheduler/partition.go:388
github.com/apache/yunikorn-core/pkg/scheduler.(*partitionManager).remove
/home/bacskop/repos/yunikorn-core/pkg/scheduler/partition_manager.go:156
github.com/apache/yunikorn-core/pkg/scheduler.(*partitionManager).Stop
/home/bacskop/repos/yunikorn-core/pkg/scheduler/partition_manager.go:97
github.com/apache/yunikorn-core/pkg/scheduler.(*ClusterContext).Stop
/home/bacskop/repos/yunikorn-core/pkg/scheduler/context.go:991
github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).Stop
/home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:217
github.com/apache/yunikorn-core/pkg/entrypoint.(*ServiceContext).StopAll
/home/bacskop/repos/yunikorn-core/pkg/entrypoint/service_context.go:50
github.com/apache/yunikorn-core/pkg/scheduler/tests.(*mockScheduler).Stop
/home/bacskop/repos/yunikorn-core/pkg/scheduler/tests/mockscheduler_test.go:91
github.com/apache/yunikorn-core/pkg/scheduler/tests.TestApplicationHistoryTracking
/home/bacskop/repos/yunikorn-core/pkg/scheduler/tests/application_tracking_test.go:172
{noformat}
The problem is that the tracker object no longer exist when
{{PartitionContext.removeApplication()}} is called. At the point the app is
also in Completed state, so it's not necessary to decrement any resource.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]