danmagyar opened a new pull request #12636:
URL: https://github.com/apache/flink/pull/12636


   ## What is the purpose of the change
   
   *This pull request introduces a new config option 
`historyserver.max.history.size` and with that the ability to limit the number 
of job archives kept in the history server, similarly to Spark’s 
`spark.history.retainedApplications`.*
   
   * Upon refreshing the archive directories, if the number of job archives 
exceeds the configured limit, the least recently modified archive file is 
deleted and its content removed from the history server’s cache.
   * To provide straightforward user experience, this config option is made 
independent of the value set to `historyserver.archive.clean-expired-jobs`.
   * The default value of this config option is set to `50` similarly to the 
respective default value in Spark.*
   
   
   ## Brief change log
   
     - *The `HistoryServer` constructor loads the config option and passes it 
to the `HistoryServerArchiveFetcher`*
     - *The `HistoryServerArchiveFetcher`’s `JobArchiveFetcherTask` upon 
refreshing the archive directories cleans up the least recently modified 
archives from the filesystem and the cache*
   
   
   ## Verifying this change
   
   This change added tests and can be verified as follows:
   
     - *Added test that creates job archive files under the jobmanager’s 
directory with their `lastModified` attribute values 1 minute apart from each 
other before and after the history server starts, validating that the least 
recently modified files are removed upon refresh*
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? yes
     - If yes, how is the feature documented? docs
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to