Dongjoon Hyun created SPARK-28294:
-------------------------------------
Summary: Support `spark.history.fs.cleaner.maxNum` configuration
Key: SPARK-28294
URL: https://issues.apache.org/jira/browse/SPARK-28294
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 3.0.0
Reporter: Dongjoon Hyun
Up to now, Apache Spark maintains the event log directory by time policy,
`spark.history.fs.cleaner.maxAge`. However, there are two issues.
1. Some file system has a limitation on the maximum number of files in a single
directory. For example, HDFS `dfs.namenode.fs-limits.max-directory-items` is
1024 * 1024 by default.
-
https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
2. Spark is sometimes unable to to clean up some old log files due to
permission issues.
To handle both (1) and (2), this issue aims to support an additional number
policy configuration for the event log directory,
`spark.history.fs.cleaner.maxNum`. Spark can try to keep the number of files in
the event log directory according to this policy.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]