[ 
https://issues.apache.org/jira/browse/YUNIKORN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790313#comment-17790313
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-2203:
-------------------------------------------------

We should not log anything in these cases. We do not log anything in the queue 
when we calculate these same values. In a cluster that is continually 
allocating these checks would be run on each scheduling attempt for an 
application.

That could mean that an application with high priority, or an old one, that 
sits at the front of the queue is checked really often. We do not have to see a 
log line every time the check as failed. If an insight is needed why it is not 
allocated the quotas should be checked to start with, not using the logs.

I am OK with completely removing the logs.

> Log spew in QueueTracker.canRunApp()
> ------------------------------------
>
>                 Key: YUNIKORN-2203
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2203
>             Project: Apache YuniKorn
>          Issue Type: Sub-task
>          Components: core - scheduler
>            Reporter: Peter Bacsko
>            Assignee: Manikandan R
>            Priority: Critical
>
> {{QueueTracker.canRunApp()}} can flood the logs in various ways.
> 1) This is always printed on DEBUG level:
> {noformat}
>       log.Log(log.SchedUGM).Debug("Checking can run app",
>               zap.Int("tracking type", int(trackType)),
>               zap.String("queue path", qt.queuePath),
>               zap.String("application", applicationID),
>               zap.Strings("hierarchy", hierarchy))
> {noformat}
> This is called in every cycle as long as the application is in Accepted state 
> and can truly cause problems on DEBUG level. It does not add too much value, 
> so I suggest removing it.
> 2)  "maxapplications" is hit:
> {noformat}
>               log.Log(log.SchedUGM).Warn("can't run app as allowing new 
> application to run would exceed configured max applications limit of specific 
> user/group",
>                       zap.Int("tracking type", int(trackType)),
>                       zap.String("queue path", qt.queuePath),
>                       zap.Int("current running applications", 
> len(qt.runningApplications)),
>                       zap.Uint64("max running applications", 
> qt.maxRunningApps))
> {noformat}
> This can be useful, but we can't afford logging this constantly. Possible 
> approaches:
> 1) Remove it anyway
> 2) Rate limit
> 3) Log once per applicationID, then log it again when finally the application 
> is allowed to run



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to