[
https://issues.apache.org/jira/browse/SPARK-41053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685671#comment-17685671
]
Dongjoon Hyun commented on SPARK-41053:
---------------------------------------
Hi, [~Gengliang.Wang]. Shall we resolve this issue?
> Better Spark UI scalability and Driver stability for large applications
> -----------------------------------------------------------------------
>
> Key: SPARK-41053
> URL: https://issues.apache.org/jira/browse/SPARK-41053
> Project: Spark
> Issue Type: Umbrella
> Components: Spark Core, Web UI
> Affects Versions: 3.4.0
> Reporter: Gengliang Wang
> Priority: Major
> Labels: releasenotes
> Attachments: Better Spark UI scalability and Driver stability for
> large applications.pdf
>
>
> After SPARK-18085, the Spark history server(SHS) becomes more scalable for
> processing large applications by supporting a persistent
> KV-store(LevelDB/RocksDB) as the storage layer.
> As for the live Spark UI, all the data is still stored in memory, which can
> bring memory pressures to the Spark driver for large applications.
> For better Spark UI scalability and Driver stability, I propose to
> * {*}Support storing all the UI data in a persistent KV store{*}.
> RocksDB/LevelDB provides low memory overhead. Their write/read performance is
> fast enough to serve the write/read workload for live UI. SHS can leverage
> the persistent KV store to fasten its startup.
> * *Support a new Protobuf serializer for all the UI data.* The new
> serializer is supposed to be faster, according to benchmarks. It will be the
> default serializer for the persistent KV store of live UI. As for event logs,
> it is optional. The current serializer for UI data is JSON. When writing
> persistent KV-store, there is GZip compression. Since there is compression
> support in RocksDB/LevelDB, the new serializer won’t compress the output
> before writing to the persistent KV store. Here is a benchmark of
> writing/reading 100,000 SQLExecutionUIData to/from RocksDB:
>
> |*Serializer*|*Avg Write time(μs)*|*Avg Read time(μs)*|*RocksDB File Total
> Size(MB)*|*Result total size in memory(MB)*|
> |*Spark’s KV Serializer(JSON+gzip)*|352.2|119.26|837|868|
> |*Protobuf*|109.9|34.3|858|2105|
> I am also proposing to support RocksDB instead of both LevelDB & RocksDB in
> the live UI.
> SPIP:
> [https://docs.google.com/document/d/1cuKnFwlTodyVhUQPMuakq2YDaLH05jaY9FRu_aD1zMo/edit?usp=sharing]
> SPIP vote: https://lists.apache.org/thread/lom4zcob6237q6nnj46jylkzwmmsxvgj
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]