[ 
https://issues.apache.org/jira/browse/YUNIKORN-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YUNIKORN-2574:
-----------------------------------
    Description: 
There is a potential data race in PartitionContext: the field 
"totalPartitionResource" is mutated in place. The problem is that the method 
{{GetTotalPartitionResource()}} does not clone it.
{noformat}
func (pc *PartitionContext) GetTotalPartitionResource() *resources.Resource {
        pc.RLock()
        defer pc.RUnlock()

        return pc.totalPartitionResource
}
{noformat}
In general, we should prefer the immutable approach for variables like this, 
just like in {{{}objects.Queue{}}}:
{noformat}
func (sq *Queue) IncAllocatedResource(alloc *resources.Resource, nodeReported 
bool) error {
        // check this queue: failure stops checks if the allocation is not part 
of a node addition
        newAllocated := resources.Add(sq.allocatedResource, alloc)    <----  
New object
        [ ... removed ... ]
        sq.Lock()
        defer sq.Unlock()
        // all OK update this queue
        sq.allocatedResource = newAllocated
        sq.updateAllocatedResourceMetrics()
        return nil
}

// incPendingResource increments pending resource of this queue and its parents.
func (sq *Queue) incPendingResource(delta *resources.Resource) {
        // update the parent
        if sq.parent != nil {
                sq.parent.incPendingResource(delta)
        }
        // update this queue
        sq.Lock()
        defer sq.Unlock()
        sq.pending = resources.Add(sq.pending, delta)     <---- New object
        sq.updatePendingResourceMetrics()
}
{noformat}

  was:
There is a potential data race in PartitionContext: the field 
"totalPartitionResource" is mutated in place. The problem is that the method 
{{GetTotalPartitionResource()}} does not clone it. 

{noformat}
func (pc *PartitionContext) GetTotalPartitionResource() *resources.Resource {
        pc.RLock()
        defer pc.RUnlock()

        return pc.totalPartitionResource
}
{noformat}

In general, we should prefer the immutable approach for variables like this, 
just like in {{objects.Queue}}:
{noformat}
func (sq *Queue) IncAllocatedResource(alloc *resources.Resource, nodeReported 
bool) error {
        // check this queue: failure stops checks if the allocation is not part 
of a node addition
        newAllocated := resources.Add(sq.allocatedResource, alloc)
        [ ... removed ... ]
        sq.Lock()
        defer sq.Unlock()
        // all OK update this queue
        sq.allocatedResource = newAllocated
        sq.updateAllocatedResourceMetrics()
        return nil
}

// incPendingResource increments pending resource of this queue and its parents.
func (sq *Queue) incPendingResource(delta *resources.Resource) {
        // update the parent
        if sq.parent != nil {
                sq.parent.incPendingResource(delta)
        }
        // update this queue
        sq.Lock()
        defer sq.Unlock()
        sq.pending = resources.Add(sq.pending, delta)
        sq.updatePendingResourceMetrics()
}
{noformat}




> totalPartitionResource should not be mutated with AddTo/SubFrom
> ---------------------------------------------------------------
>
>                 Key: YUNIKORN-2574
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2574
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler
>    Affects Versions: 1.4.0, 1.5.0
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>            Priority: Major
>
> There is a potential data race in PartitionContext: the field 
> "totalPartitionResource" is mutated in place. The problem is that the method 
> {{GetTotalPartitionResource()}} does not clone it.
> {noformat}
> func (pc *PartitionContext) GetTotalPartitionResource() *resources.Resource {
>       pc.RLock()
>       defer pc.RUnlock()
>       return pc.totalPartitionResource
> }
> {noformat}
> In general, we should prefer the immutable approach for variables like this, 
> just like in {{{}objects.Queue{}}}:
> {noformat}
> func (sq *Queue) IncAllocatedResource(alloc *resources.Resource, nodeReported 
> bool) error {
>       // check this queue: failure stops checks if the allocation is not part 
> of a node addition
>       newAllocated := resources.Add(sq.allocatedResource, alloc)    <----  
> New object
>         [ ... removed ... ]
>       sq.Lock()
>       defer sq.Unlock()
>       // all OK update this queue
>       sq.allocatedResource = newAllocated
>       sq.updateAllocatedResourceMetrics()
>       return nil
> }
> // incPendingResource increments pending resource of this queue and its 
> parents.
> func (sq *Queue) incPendingResource(delta *resources.Resource) {
>       // update the parent
>       if sq.parent != nil {
>               sq.parent.incPendingResource(delta)
>       }
>       // update this queue
>       sq.Lock()
>       defer sq.Unlock()
>       sq.pending = resources.Add(sq.pending, delta)     <---- New object
>         sq.updatePendingResourceMetrics()
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to