[
https://issues.apache.org/jira/browse/YUNIKORN-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479117#comment-17479117
]
Anuraag Nalluri edited comment on YUNIKORN-949 at 1/20/22, 6:58 AM:
--------------------------------------------------------------------
Hi [~pbacsko] and [~wwei]. Do you have a recommendation as to what retention
policy should be implemented? As suggested by Weiwei, I'm thinking of adding
another route that specifies the max file size in MB. If the output file
exceeds this limit, we should remove the entries in FIFO order (from top of the
file).
If this is unspecified, we should probably use a fraction of the total space
available to us on the scheduler container as a reasonable default for the file
size limit. However, if the container is very constrained on free space, this
still doesn't guarantee the log file can reach its max allowed limit without
causing OOM. If this idea is ok, we can pass the container memory limit through
downward API.
was (Author: JIRAUSER283086):
Hi [~pbacsko] and [~wwei]. Do you have a recommendation as to what retention
policy should be implemented? As suggested by Weiwei, I'm thinking of adding
another route that specifies the max file size in MB. If the output file
exceeds this limit, we should remove the entries in FIFO order (from top of the
file).
If this is unspecified, we should probably use a fraction of the total space
available to us on the scheduler container as a reasonable default for the file
size limit. However, if the container is very constrained on free space, this
still doesn't guarantee the log file can reach its max allowed limit without
causing OOM. If this idea is ok, we can pass the container memory limit through
downward API.
But if we want to have higher confidence of not running in to OOM, perhaps we
can use OS commands to get the remaining "free" space on container and set the
default as a fraction of that. What do you guys think?
> Location of the state dump file should be configurable
> ------------------------------------------------------
>
> Key: YUNIKORN-949
> URL: https://issues.apache.org/jira/browse/YUNIKORN-949
> Project: Apache YuniKorn
> Issue Type: Improvement
> Components: core - scheduler
> Reporter: Peter Bacsko
> Assignee: Anuraag Nalluri
> Priority: Major
>
> In YUNIKORN-940, the periodic state dump feature was introduced.
> However, the location of the file is fixed: it's the current working
> directory of the YK scheduler binary. This can become a problem with docker
> containers having a small free space or if the user wants the state to be
> logged frequently.
> The location of the file should be configurable, so it can be written an
> external volume.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]