[
https://issues.apache.org/jira/browse/YUNIKORN-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Peter Bacsko updated YUNIKORN-2574:
-----------------------------------
Description:
There is a potential data race in {{PartitionContext}}: the field
{{totalPartitionResource}} is mutated in place. The problem is that the method
{{GetTotalPartitionResource()}} does not clone it.
{noformat}
func (pc *PartitionContext) GetTotalPartitionResource() *resources.Resource {
pc.RLock()
defer pc.RUnlock()
return pc.totalPartitionResource
}
{noformat}
In general, we should prefer the immutable approach for variables like this,
just like in {{{}objects.Queue{}}}:
{noformat}
func (sq *Queue) IncAllocatedResource(alloc *resources.Resource, nodeReported
bool) error {
// check this queue: failure stops checks if the allocation is not part
of a node addition
newAllocated := resources.Add(sq.allocatedResource, alloc) <----
New object
[ ... removed ... ]
sq.Lock()
defer sq.Unlock()
// all OK update this queue
sq.allocatedResource = newAllocated
sq.updateAllocatedResourceMetrics()
return nil
}
// incPendingResource increments pending resource of this queue and its parents.
func (sq *Queue) incPendingResource(delta *resources.Resource) {
// update the parent
if sq.parent != nil {
sq.parent.incPendingResource(delta)
}
// update this queue
sq.Lock()
defer sq.Unlock()
sq.pending = resources.Add(sq.pending, delta) <---- New object
sq.updatePendingResourceMetrics()
}
{noformat}
was:
There is a potential data race in PartitionContext: the field
"totalPartitionResource" is mutated in place. The problem is that the method
{{GetTotalPartitionResource()}} does not clone it.
{noformat}
func (pc *PartitionContext) GetTotalPartitionResource() *resources.Resource {
pc.RLock()
defer pc.RUnlock()
return pc.totalPartitionResource
}
{noformat}
In general, we should prefer the immutable approach for variables like this,
just like in {{{}objects.Queue{}}}:
{noformat}
func (sq *Queue) IncAllocatedResource(alloc *resources.Resource, nodeReported
bool) error {
// check this queue: failure stops checks if the allocation is not part
of a node addition
newAllocated := resources.Add(sq.allocatedResource, alloc) <----
New object
[ ... removed ... ]
sq.Lock()
defer sq.Unlock()
// all OK update this queue
sq.allocatedResource = newAllocated
sq.updateAllocatedResourceMetrics()
return nil
}
// incPendingResource increments pending resource of this queue and its parents.
func (sq *Queue) incPendingResource(delta *resources.Resource) {
// update the parent
if sq.parent != nil {
sq.parent.incPendingResource(delta)
}
// update this queue
sq.Lock()
defer sq.Unlock()
sq.pending = resources.Add(sq.pending, delta) <---- New object
sq.updatePendingResourceMetrics()
}
{noformat}
> totalPartitionResource should not be mutated with AddTo/SubFrom
> ---------------------------------------------------------------
>
> Key: YUNIKORN-2574
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2574
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler
> Affects Versions: 1.4.0, 1.5.0
> Reporter: Peter Bacsko
> Assignee: Peter Bacsko
> Priority: Major
>
> There is a potential data race in {{PartitionContext}}: the field
> {{totalPartitionResource}} is mutated in place. The problem is that the
> method {{GetTotalPartitionResource()}} does not clone it.
> {noformat}
> func (pc *PartitionContext) GetTotalPartitionResource() *resources.Resource {
> pc.RLock()
> defer pc.RUnlock()
> return pc.totalPartitionResource
> }
> {noformat}
> In general, we should prefer the immutable approach for variables like this,
> just like in {{{}objects.Queue{}}}:
> {noformat}
> func (sq *Queue) IncAllocatedResource(alloc *resources.Resource, nodeReported
> bool) error {
> // check this queue: failure stops checks if the allocation is not part
> of a node addition
> newAllocated := resources.Add(sq.allocatedResource, alloc) <----
> New object
> [ ... removed ... ]
> sq.Lock()
> defer sq.Unlock()
> // all OK update this queue
> sq.allocatedResource = newAllocated
> sq.updateAllocatedResourceMetrics()
> return nil
> }
> // incPendingResource increments pending resource of this queue and its
> parents.
> func (sq *Queue) incPendingResource(delta *resources.Resource) {
> // update the parent
> if sq.parent != nil {
> sq.parent.incPendingResource(delta)
> }
> // update this queue
> sq.Lock()
> defer sq.Unlock()
> sq.pending = resources.Add(sq.pending, delta) <---- New object
> sq.updatePendingResourceMetrics()
> }
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]