[jira] [Commented] (SPARK-41053) Better Spark UI scalability and Driver stability for large applications

Gengliang Wang (Jira) Thu, 22 Dec 2022 00:54:10 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-41053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17651170#comment-17651170
 ]


Gengliang Wang commented on SPARK-41053:
----------------------------------------

[~LuciferYang] [~techaddict] FYI I just marked LogInfo and ApplicationStoreInfo 
as won't do. Seem trivial. Adding too many protobuf serializers increase the 
maintenance cost. However, currently there is no rule for which one should be 
in.  

What do you think?

 

> Better Spark UI scalability and Driver stability for large applications
> -----------------------------------------------------------------------
>
>                 Key: SPARK-41053
>                 URL: https://issues.apache.org/jira/browse/SPARK-41053
>             Project: Spark
>          Issue Type: Umbrella
>          Components: Spark Core, Web UI
>    Affects Versions: 3.4.0
>            Reporter: Gengliang Wang
>            Priority: Major
>         Attachments: Better Spark UI scalability and Driver stability for 
> large applications.pdf
>
>
> After SPARK-18085, the Spark history server(SHS) becomes more scalable for 
> processing large applications by supporting a persistent 
> KV-store(LevelDB/RocksDB) as the storage layer.
> As for the live Spark UI, all the data is still stored in memory, which can 
> bring memory pressures to the Spark driver for large applications.
> For better Spark UI scalability and Driver stability, I propose to
>  * {*}Support storing all the UI data in a persistent KV store{*}. 
> RocksDB/LevelDB provides low memory overhead. Their write/read performance is 
> fast enough to serve the write/read workload for live UI. SHS can leverage 
> the persistent KV store to fasten its startup.
>  * *Support a new Protobuf serializer for all the UI data.* The new 
> serializer is supposed to be faster, according to benchmarks. It will be the 
> default serializer for the persistent KV store of live UI. As for event logs, 
> it is optional. The current serializer for UI data is JSON. When writing 
> persistent KV-store, there is GZip compression. Since there is compression 
> support in RocksDB/LevelDB, the new serializer won’t compress the output 
> before writing to the persistent KV store. Here is a benchmark of 
> writing/reading 100,000 SQLExecutionUIData to/from RocksDB:
>  
> |*Serializer*|*Avg Write time(μs)*|*Avg Read time(μs)*|*RocksDB File Total 
> Size(MB)*|*Result total size in memory(MB)*|
> |*Spark’s KV Serializer(JSON+gzip)*|352.2|119.26|837|868|
> |*Protobuf*|109.9|34.3|858|2105|
> I am also proposing to support RocksDB instead of both LevelDB & RocksDB in 
> the live UI.
> SPIP: 
> [https://docs.google.com/document/d/1cuKnFwlTodyVhUQPMuakq2YDaLH05jaY9FRu_aD1zMo/edit?usp=sharing]
> SPIP vote: https://lists.apache.org/thread/lom4zcob6237q6nnj46jylkzwmmsxvgj



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41053) Better Spark UI scalability and Driver stability for large applications

Reply via email to