[ 
https://issues.apache.org/jira/browse/YUNIKORN-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479434#comment-17479434
 ] 

Peter Bacsko commented on YUNIKORN-949:
---------------------------------------

Yes, we periodically append JSON data to a file.

Not sure what's the most reasonable solution here. The state of a bigger 
cluster (eg. 500 nodes) can be pretty big, although I don't have exact figures 
about the size of the expected output, not even estimates. Certain fields grow 
linearly with the size of the cluster:

{code}
type AggregatedStateInfo struct {
        Timestamp          string
        Partitions         []*dao.PartitionInfo
        Applications       []*dao.ApplicationDAOInfo
        AppHistory         []*dao.ApplicationHistoryDAOInfo
        Nodes              []*dao.NodesDAOInfo
        NodesUtilization   []*dao.NodesUtilDAOInfo
        ClusterInfo        []*dao.ClusterDAOInfo
        ClusterUtilization []*dao.ClustersUtilDAOInfo
        ContainerHistory   []*dao.ContainerHistoryDAOInfo
        Queues             []*dao.PartitionDAOInfo
        LogLevel           string
}
{code}

Chances are that the bigger the cluster, the more applications you have, so 
that's also a factor.

We can follow the traditional rolling approach (eg. file size reaches 10MB, we 
rename it to state_dump.txt.001) and we maximize how many  previous state dump 
files we can have.

Wilfred also suggested using external volumes that gets mounted into the pod, 
make the path configurable then the output is written to the external location 
without limitations. This delegates the responsibility of creating enough free 
space to the user.

> Location of the state dump file should be configurable
> ------------------------------------------------------
>
>                 Key: YUNIKORN-949
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-949
>             Project: Apache YuniKorn
>          Issue Type: Improvement
>          Components: core - scheduler
>            Reporter: Peter Bacsko
>            Assignee: Anuraag Nalluri
>            Priority: Major
>
> In YUNIKORN-940, the periodic state dump feature was introduced.
> However, the location of the file is fixed: it's the current working 
> directory of the YK scheduler binary. This can become a problem with docker 
> containers having a small free space or if the user wants the state to be 
> logged frequently.
> The location of the file should be configurable, so it can be written an 
> external volume.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to