[
https://issues.apache.org/jira/browse/YUNIKORN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790322#comment-17790322
]
Wilfred Spiegelenburg commented on YUNIKORN-2203:
-------------------------------------------------
Other locations that do log in the UGM at DEBUG and I think should be logged at
an INFO level:
* L136: "Removing user from manager"
* L162: "Removing group from manager"
* L222: "Group tracker doesn't exists. Creating appGroup tracker"
* L231: "Group tracker set for user application"
* L565: "User tracker doesn't exists. Creating user tracker."
These are all major changes and low volume which should be tracked in the logs.
> Log spew in QueueTracker.canRunApp()
> ------------------------------------
>
> Key: YUNIKORN-2203
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2203
> Project: Apache YuniKorn
> Issue Type: Sub-task
> Components: core - scheduler
> Reporter: Peter Bacsko
> Assignee: Manikandan R
> Priority: Critical
>
> {{QueueTracker.canRunApp()}} can flood the logs in various ways.
> 1) This is always printed on DEBUG level:
> {noformat}
> log.Log(log.SchedUGM).Debug("Checking can run app",
> zap.Int("tracking type", int(trackType)),
> zap.String("queue path", qt.queuePath),
> zap.String("application", applicationID),
> zap.Strings("hierarchy", hierarchy))
> {noformat}
> This is called in every cycle as long as the application is in Accepted state
> and can truly cause problems on DEBUG level. It does not add too much value,
> so I suggest removing it.
> 2) "maxapplications" is hit:
> {noformat}
> log.Log(log.SchedUGM).Warn("can't run app as allowing new
> application to run would exceed configured max applications limit of specific
> user/group",
> zap.Int("tracking type", int(trackType)),
> zap.String("queue path", qt.queuePath),
> zap.Int("current running applications",
> len(qt.runningApplications)),
> zap.Uint64("max running applications",
> qt.maxRunningApps))
> {noformat}
> This can be useful, but we can't afford logging this constantly. Possible
> approaches:
> 1) Remove it anyway
> 2) Rate limit
> 3) Log once per applicationID, then log it again when finally the application
> is allowed to run
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]