[
https://issues.apache.org/jira/browse/YUNIKORN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790292#comment-17790292
]
Peter Bacsko edited comment on YUNIKORN-2203 at 11/27/23 10:53 PM:
-------------------------------------------------------------------
cc [[email protected]] [~wilfreds]. I marked this as Critical because it's
a big issue when using "maxapplications".
I really like approach #3 because that prints the least amount of output and
it's the most informative.
In general, we really have to be mindful what we log in the UGM code. Code here
is called very frequently. I think we should carefully review the existing code
to identify potential excessive logging.
was (Author: pbacsko):
cc [[email protected]] [~wilfreds]. I marked this as Critical because it's
a big issue when using "maxapplications".
In general, we really have to be mindful what we log in the UGM code. Code here
is called very frequently. I think we should carefully review the existing code
to identify potential excessive logging.
> Log spew in QueueTracker.canRunApp()
> ------------------------------------
>
> Key: YUNIKORN-2203
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2203
> Project: Apache YuniKorn
> Issue Type: Sub-task
> Components: core - scheduler
> Reporter: Peter Bacsko
> Assignee: Manikandan R
> Priority: Critical
>
> {{QueueTracker.canRunApp()}} can flood the logs in various ways.
> 1) This is always printed on DEBUG level:
> {noformat}
> log.Log(log.SchedUGM).Debug("Checking can run app",
> zap.Int("tracking type", int(trackType)),
> zap.String("queue path", qt.queuePath),
> zap.String("application", applicationID),
> zap.Strings("hierarchy", hierarchy))
> {noformat}
> This is called in every cycle as long as the application is in Accepted state
> and can truly cause problems on DEBUG level. It does not add too much value,
> so I suggest removing it.
> 2) "maxapplications" is hit:
> {noformat}
> log.Log(log.SchedUGM).Warn("can't run app as allowing new
> application to run would exceed configured max applications limit of specific
> user/group",
> zap.Int("tracking type", int(trackType)),
> zap.String("queue path", qt.queuePath),
> zap.Int("current running applications",
> len(qt.runningApplications)),
> zap.Uint64("max running applications",
> qt.maxRunningApps))
> {noformat}
> This can be useful, but we can't afford logging this constantly. Possible
> approaches:
> 1) Remove it anyway
> 2) Rate limit
> 3) Log once per applicationID, then log it again when finally the application
> is allowed to run
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]