[ 
https://issues.apache.org/jira/browse/YUNIKORN-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801020#comment-17801020
 ] 

Peter Bacsko commented on YUNIKORN-2284:
----------------------------------------

[~yangpoan] good question...  I was thinking of three approaches:

1. Check if sa.allocatedResource + sa.allocatedPlaceholder is zero. If it is, 
don't call decUserResourceUsage(). 
2. Check if the state of the application is terminal (Failed/Succeeded). 
3. Check if sa.allocations is empty.

#3 seems to be the smallest change, ie. there's nothing to remove if that map 
is empty. We call RemoveAllAllocations() regardless of the app status, so this 
seems to be the simplest. With the introduction of UGM, there are side effects 
if we call this with no allocations, so let's just skip it if we can.

> ERROR message when stopping Service context
> -------------------------------------------
>
>                 Key: YUNIKORN-2284
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2284
>             Project: Apache YuniKorn
>          Issue Type: Sub-task
>          Components: core - scheduler
>            Reporter: Peter Bacsko
>            Assignee: PoAn Yang
>            Priority: Minor
>
> After YUNIKORN-2233, the scheduler core can be stopped. This causes an issue 
> inside the MockScheduler:
> {noformat}
> 2023-12-21T17:58:49.203+0100  INFO    core.scheduler.ugm      
> ugm/manager.go:136      Removing user from manager      {"user": "testuser"}
> ...
> 2023-12-21T17:58:59.209+0100  INFO    core.entrypoint 
> entrypoint/service_context.go:40        ServiceContext stop all services
> ...
> 2023-12-21T17:58:59.211+0100  INFO    core.scheduler.partition        
> scheduler/partition_manager.go:144      marking all queues for removal  
> {"partitionName": "[rm:123]default"}
> 2023-12-21T17:58:59.211+0100  INFO    core.scheduler.queue    
> objects/queue.go:952    marking managed queue for deletion      {"queue": 
> "root"}
> 2023-12-21T17:58:59.212+0100  INFO    core.scheduler.fsm      
> objects/object_state.go:81      object transition       {"object": "root", 
> "source": "Active", "destination": "Draining", "event": "Remove"}
> 2023-12-21T17:58:59.212+0100  INFO    core.scheduler.queue    
> objects/queue.go:952    marking managed queue for deletion      {"queue": 
> "root.singleleaf"}
> 2023-12-21T17:58:59.212+0100  INFO    core.scheduler.fsm      
> objects/object_state.go:81      object transition       {"object": 
> "root.singleleaf", "source": "Active", "destination": "Draining", "event": 
> "Remove"}
> 2023-12-21T17:58:59.212+0100  INFO    core.scheduler.partition        
> scheduler/partition_manager.go:150      removing all applications from 
> partition        {"numOfApps": 1, "partitionName": "[rm:123]default"}
> 2023-12-21T17:58:59.212+0100  INFO    core.scheduler.application      
> objects/application.go:608      ask removed successfully from application     
>   {"appID": "app-1", "ask": "", "pendingDelta": "map[memory:0 vcore:0]"}
> 2023-12-21T17:58:59.212+0100  INFO    core.scheduler.queue    
> objects/queue.go:837    Application completed and removed from queue    
> {"queueName": "root.singleleaf", "applicationID": "app-1"}
> 2023-12-21T17:59:32.848+0100  ERROR   core.scheduler.ugm      
> ugm/manager.go:118      user tracker must be available in userTrackers map    
>   {"user": "testuser"}
> github.com/apache/yunikorn-core/pkg/scheduler/ugm.(*Manager).DecreaseTrackedResource
>       /home/bacskop/repos/yunikorn-core/pkg/scheduler/ugm/manager.go:118
> github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).decUserResourceUsage
>       
> /home/bacskop/repos/yunikorn-core/pkg/scheduler/objects/application.go:1654
> github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).RemoveAllAllocations
>       
> /home/bacskop/repos/yunikorn-core/pkg/scheduler/objects/application.go:1843
> github.com/apache/yunikorn-core/pkg/scheduler.(*PartitionContext).removeApplication
>       /home/bacskop/repos/yunikorn-core/pkg/scheduler/partition.go:388
> github.com/apache/yunikorn-core/pkg/scheduler.(*partitionManager).remove
>       /home/bacskop/repos/yunikorn-core/pkg/scheduler/partition_manager.go:156
> github.com/apache/yunikorn-core/pkg/scheduler.(*partitionManager).Stop
>       /home/bacskop/repos/yunikorn-core/pkg/scheduler/partition_manager.go:97
> github.com/apache/yunikorn-core/pkg/scheduler.(*ClusterContext).Stop
>       /home/bacskop/repos/yunikorn-core/pkg/scheduler/context.go:991
> github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).Stop
>       /home/bacskop/repos/yunikorn-core/pkg/scheduler/scheduler.go:217
> github.com/apache/yunikorn-core/pkg/entrypoint.(*ServiceContext).StopAll
>       /home/bacskop/repos/yunikorn-core/pkg/entrypoint/service_context.go:50
> github.com/apache/yunikorn-core/pkg/scheduler/tests.(*mockScheduler).Stop
>       
> /home/bacskop/repos/yunikorn-core/pkg/scheduler/tests/mockscheduler_test.go:91
> github.com/apache/yunikorn-core/pkg/scheduler/tests.TestApplicationHistoryTracking
>       
> /home/bacskop/repos/yunikorn-core/pkg/scheduler/tests/application_tracking_test.go:172
> {noformat}
> The problem is that the tracker object no longer exist when 
> {{PartitionContext.removeApplication()}} is called. At this point the app is 
> also in Completed state, so it's not necessary to decrement any resource.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to