[
https://issues.apache.org/jira/browse/YUNIKORN-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wilfred Spiegelenburg reassigned YUNIKORN-1077:
-----------------------------------------------
Assignee: Peter Bacsko
This has been reproduced and we have a RC that leads to this case:
The placeholder release with a TIMEOUT causes the issue based on the log
analysis. First the release is counted when the placeholder really times out.
The second time is after the shim has confirmed the removal. The core processes
the release to update the nodes etc and then sends the release to the shim
again counting it twice.
The core should not send the release a second time to the shim, just handle the
accounting and leave it at that. Similar processing as that is performed for
the placeholder replacement.
[~pbacsko] is working on a fix
> Negative Container Count
> ------------------------
>
> Key: YUNIKORN-1077
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1077
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - common
> Reporter: Si Latt
> Assignee: Peter Bacsko
> Priority: Major
>
> For some unknown reason, Yunikorn sometimes think container count is
> negative. This info gets displayed on the dashboard. Also, in YK log, I can
> see the following log lines:
> {code:java}
> 2022-02-08T09:08:17.878Z WARN metrics/metrics_collector.go:85 Could
> not
> calculate the totalContainersRunning. {"allocatedContainers": 23,
> "releasedContainers": 27} {code}
> YK team mentioned it's possibly a metrics bug and hence I am filing the
> report. I haven't been able to repro the issue yet.
>
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]