Chang Chen created SPARK-56918:
----------------------------------
Summary: Add ManagedConsumer SPI for shrinkable external storage
memory
Key: SPARK-56918
URL: https://issues.apache.org/jira/browse/SPARK-56918
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 4.2.0
Reporter: Chang Chen
Today an executor's off-heap memory falls into three categories from Spark's
perspective:
|| Category || Owner || Spark accounting || Releasable on demand? ||
| Execution | Spark task allocations (sort, agg, join) | Tracked, arbitrated |
Yes (spill) |
| Storage | Spark RDD cache / broadcast (MemoryStore) | Tracked, arbitrated |
Yes (evict) |
| Unmanaged | RocksDB state store (SPARK-53001) and similar | *Reported* only
(pull-mode poll) | *No* — informational |
A fourth category is emerging and has no home today: *shrinkable external
caches that are per-executor singletons*, serving the whole executor. Examples
are Velox \{{AsyncDataCache}} (Gluten) and other native columnar caches. They
share the same off-heap budget as Spark storage and, unlike unmanaged memory,
*can release bytes on request* by evicting cold pages.
For these consumers, both existing options are wrong:
* Treating them as *storage memory* via \{{MemoryStore}} doesn't work —
\{{MemoryStore}} is bound to \{{BlockManager}}, \{{SerializerManager}}, and
\{{BlockEvictionHandler}}, which is exactly the limitation SPARK-48694 called
out.
* Treating them as *unmanaged memory* via \{{UnmanagedMemoryConsumer}}
(SPARK-53001) is informational only — Spark can subtract their usage from
\{{effectiveMaxMemory}} but cannot ask them to release when storage pressure
rises, and they cannot ask Spark for more memory when storage is idle.
The result today is static partitioning of off-heap between Spark storage and
the external cache, defeating the purpose of unified memory management.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
