[ 
https://issues.apache.org/jira/browse/YUNIKORN-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg reassigned YUNIKORN-1077:
-----------------------------------------------

    Assignee: Peter Bacsko

This has been reproduced and we have a RC that leads to this case:

The placeholder release with a TIMEOUT causes the issue based on the log 
analysis. First the release is counted when the placeholder really times out. 
The second time is after the shim has confirmed the removal. The core processes 
the release to update the nodes etc and then sends the release to the shim 
again counting it twice.

The core should not send the release a second time to the shim, just handle the 
accounting and leave it at that. Similar processing as that is performed for 
the placeholder replacement.

[~pbacsko] is working on a fix

> Negative Container Count
> ------------------------
>
>                 Key: YUNIKORN-1077
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-1077
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - common
>            Reporter: Si Latt
>            Assignee: Peter Bacsko
>            Priority: Major
>
> For some unknown reason, Yunikorn sometimes think container count is 
> negative. This info gets displayed on the dashboard. Also, in YK log, I can 
> see the following log lines:
> {code:java}
> 2022-02-08T09:08:17.878Z        WARN    metrics/metrics_collector.go:85 Could 
> not 
> calculate the totalContainersRunning. {"allocatedContainers": 23, 
> "releasedContainers": 27} {code}
> YK team mentioned it's possibly a metrics bug and hence I am filing the 
> report. I haven't been able to repro the issue yet.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to