[
https://issues.apache.org/jira/browse/YUNIKORN-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801020#comment-17801020
]
Peter Bacsko commented on YUNIKORN-2284:
----------------------------------------
[~yangpoan] good question... I was thinking of three approaches:
1. Check if sa.allocatedResource + sa.allocatedPlaceholder is zero. If it is,
don't call decUserResourceUsage().
2. Check if the state of the application is terminal (Failed/Succeeded).
3. Check if sa.allocations is empty.
#3 seems to be the smallest change, ie. there's nothing to remove if that map
is empty. We call RemoveAllAllocations() regardless of the app status, so this
seems to be the simplest. With the introduction of UGM, there are side effects
if we call this with no allocations, so let's just skip it if we can.
> ERROR message when stopping Service context
> -------------------------------------------
>
> Key: YUNIKORN-2284
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2284
> Project: Apache YuniKorn
> Issue Type: Sub-task
> Components: core - scheduler
> Reporter: Peter Bacsko
> Assignee: PoAn Yang
> Priority: Minor
>
> After YUNIKORN-2233, the scheduler core can be stopped. This causes an issue
> inside the MockScheduler:
> {noformat}
> 2023-12-21T17:58:49.203+0100 INFO core.scheduler.ugm
> ugm/manager.go:136 Removing user from manager {"user": "testuser"}
> ...
> 2023-12-21T17:58:59.209+0100 INFO core.entrypoint
> entrypoint/service_context.go:40 ServiceContext stop all services
> ...
> 2023-12-21T17:58:59.211+0100 INFO core.scheduler.partition
> scheduler/partition_manager.go:144 marking all queues for removal
> {"partitionName": "[rm:123]default"}
> 2023-12-21T17:58:59.211+0100 INFO core.scheduler.queue
> objects/queue.go:952 marking managed queue for deletion {"queue":
> "root"}
> 2023-12-21T17:58:59.212+0100 INFO core.scheduler.fsm
> objects/object_state.go:81 object transition {"object": "root",
> "source": "Active", "destination": "Draining", "event": "Remove"}
> 2023-12-21T17:58:59.212+0100 INFO core.scheduler.queue
> objects/queue.go:952 marking managed queue for deletion {"queue":
> "root.singleleaf"}
> 2023-12-21T17:58:59.212+0100 INFO core.scheduler.fsm
> objects/object_state.go:81 object transition {"object":
> "root.singleleaf", "source": "Active", "destination": "Draining", "event":
> "Remove"}
> 2023-12-21T17:58:59.212+0100 INFO core.scheduler.partition
> scheduler/partition_manager.go:150 removing all applications from
> partition {"numOfApps": 1, "partitionName": "[rm:123]default"}
> 2023-12-21T17:58:59.212+0100 INFO core.scheduler.application
> objects/application.go:608 ask removed successfully from application
> {"appID": "app-1", "ask": "", "pendingDelta": "map[memory:0 vcore:0]"}
> 2023-12-21T17:58:59.212+0100 INFO core.scheduler.queue
> objects/queue.go:837 Application completed and removed from queue
> {"queueName": "root.singleleaf", "applicationID": "app-1"}
> 2023-12-21T17:59:32.848+0100 ERROR core.scheduler.ugm
> ugm/manager.go:118 user tracker must be available in userTrackers map
> {"user": "testuser"}
> github.com/apache/yunikorn-core/pkg/scheduler/ugm.(*Manager).DecreaseTrackedResource
> /home/bacskop/repos/yunikorn-core/pkg/scheduler/ugm/manager.go:118
> github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).decUserResourceUsage
>
> /home/bacskop/repos/yunikorn-core/pkg/scheduler/objects/application.go:1654
> github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).RemoveAllAllocations
>
> /home/bacskop/repos/yunikorn-core/pkg/scheduler/objects/application.go:1843
> github.com/apache/yunikorn-core/pkg/scheduler.(*PartitionContext).removeApplication
> /home/bacskop/repos/yunikorn-core/pkg/scheduler/partition.go:388
> github.com/apache/yunikorn-core/pkg/scheduler.(*partitionManager).remove
> /home/bacskop/repos/yunikorn-core/pkg/scheduler/partition_manager.go:156
> github.com/apache/yunikorn-core/pkg/scheduler.(*partitionManager).Stop
> /home/bacskop/repos/yunikorn-core/pkg/scheduler/partition_manager.go:97
> github.com/apache/yunikorn-core/pkg/scheduler.(*ClusterContext).Stop
> /home/bacskop/repos/yunikorn-core/pkg/scheduler/context.go:991
> github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).Stop
> /home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:217
> github.com/apache/yunikorn-core/pkg/entrypoint.(*ServiceContext).StopAll
> /home/bacskop/repos/yunikorn-core/pkg/entrypoint/service_context.go:50
> github.com/apache/yunikorn-core/pkg/scheduler/tests.(*mockScheduler).Stop
>
> /home/bacskop/repos/yunikorn-core/pkg/scheduler/tests/mockscheduler_test.go:91
> github.com/apache/yunikorn-core/pkg/scheduler/tests.TestApplicationHistoryTracking
>
> /home/bacskop/repos/yunikorn-core/pkg/scheduler/tests/application_tracking_test.go:172
> {noformat}
> The problem is that the tracker object no longer exist when
> {{PartitionContext.removeApplication()}} is called. At this point the app is
> also in Completed state, so it's not necessary to decrement any resource.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]