Shuai Lu created SPARK-56044:
--------------------------------
Summary: HistoryServerDiskManager does not delete app store on
release when app is not in active map
Key: SPARK-56044
URL: https://issues.apache.org/jira/browse/SPARK-56044
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 4.1.1, 3.5.7, 4.1.0, 3.1.1, 4.2.0
Reporter: Shuai Lu
In {{HistoryServerDiskManager.release()}}, the store directory deletion is
gated inside an {{oldSizeOpt.foreach}} block, which only executes when the
application is present in the in-memory {{active}} map:
{code:scala}
val oldSizeOpt = active.synchronized {
active.remove(appId -> attemptId)
}
oldSizeOpt.foreach { oldSize =>
val path = appStorePath(appId, attemptId)
updateUsage(-oldSize, committed = true)
if (path.isDirectory()) {
if (delete) {
deleteStore(path) // never reached if app is not in active map
}
...
}
}
{code}
The {{active}} map is in-memory only and is empty after a History Server
restart. When log expiration triggers {{release(appId, attemptId, delete =
true)}} for an app that was never reopened after a restart, {{oldSizeOpt}} is
{{None}}, the block is skipped entirely, and the on-disk store directory (.ldb
/ .rdb) is never deleted. Over time these orphaned store directories
accumulate, consuming disk space indefinitely.
*Fix:*
Separate the {{updateUsage}} deduction (which correctly applies only to
actively tracked apps) from the directory operation (which should apply
whenever the directory exists on disk). When deleting an app that was not in
the active map, derive its size directly from disk before deducting it from
usage to keep accounting accurate.
Steps to Reproduce:
# Start History Server with a non-trivial max disk usage setting.
# Load several applications (their .ldb/.rdb stores are created on disk).
# Close the application UIs (release without delete -- stores remain on disk).
# Restart the History Server (active map is now empty).
# Wait for or trigger log expiration cleanup.
# Observe that the .ldb/.rdb store directories are NOT deleted despite
release(delete=true) being called.
*Expected:* Store directories are deleted when {{release(delete=true)}} is
called.
*Actual:* Store directories are silently left on disk when the app is not in
the active map.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]