[
https://issues.apache.org/jira/browse/SPARK-56044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun reassigned SPARK-56044:
-------------------------------------
Assignee: Shuai Lu
> HistoryServerDiskManager does not delete app store on release when app is not
> in active map
> -------------------------------------------------------------------------------------------
>
> Key: SPARK-56044
> URL: https://issues.apache.org/jira/browse/SPARK-56044
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 3.1.1, 4.1.0, 3.5.7, 4.2.0, 4.1.1
> Reporter: Shuai Lu
> Assignee: Shuai Lu
> Priority: Major
> Labels: pull-request-available
>
> In {{HistoryServerDiskManager.release()}}, the store directory deletion is
> gated inside an {{oldSizeOpt.foreach}} block, which only executes when the
> application is present in the in-memory {{active}} map:
> {code:scala}
> val oldSizeOpt = active.synchronized {
> active.remove(appId -> attemptId)
> }
> oldSizeOpt.foreach { oldSize =>
> val path = appStorePath(appId, attemptId)
> updateUsage(-oldSize, committed = true)
> if (path.isDirectory()) {
> if (delete) {
> deleteStore(path) // never reached if app is not in active map
> }
> ...
> }
> }
> {code}
> The {{active}} map is in-memory only and is empty after a History Server
> restart. When log expiration triggers {{release(appId, attemptId, delete =
> true)}} for an app that was never reopened after a restart, {{oldSizeOpt}} is
> {{None}}, the block is skipped entirely, and the on-disk store directory
> (.ldb / .rdb) is never deleted. Over time these orphaned store directories
> accumulate, consuming disk space indefinitely.
> *Fix:*
> Separate the {{updateUsage}} deduction (which correctly applies only to
> actively tracked apps) from the directory operation (which should apply
> whenever the directory exists on disk). When deleting an app that was not in
> the active map, derive its size directly from disk before deducting it from
> usage to keep accounting accurate.
> Steps to Reproduce:
> # Start History Server with a non-trivial max disk usage setting.
> # Load several applications (their .ldb/.rdb stores are created on disk).
> # Close the application UIs (release without delete -- stores remain on disk).
> # Restart the History Server (active map is now empty).
> # Wait for or trigger log expiration cleanup.
> # Observe that the .ldb/.rdb store directories are NOT deleted despite
> release(delete=true) being called.
> *Expected:* Store directories are deleted when {{release(delete=true)}} is
> called.
> *Actual:* Store directories are silently left on disk when the app is not in
> the active map.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]