Gengliang Wang created SPARK-41053:
--------------------------------------

             Summary: Support disk-based KV store in Spark live UI
                 Key: SPARK-41053
                 URL: https://issues.apache.org/jira/browse/SPARK-41053
             Project: Spark
          Issue Type: Umbrella
          Components: Spark Core, Web UI
    Affects Versions: 3.4.0
            Reporter: Gengliang Wang


The current architecture of Spark live UI and Spark history server(SHS) is too 
simple to serve large clusters and heavy workloads:
 * Spark stores all the live UI date in memory. The size can be a few GBs and 
affects the driver's stability (OOM). 
 * There is a limitation of storing 1000 queries only. Note that we can’t 
simply increase the limitation under the current Architecture. I did a memory 
profiling. Storing one query execution detail can take 800KB while storing one 
task requires 0.3KB. So for 1000 SQL queries with 1000* 2000 tasks, the memory 
usage for query execution and task data will be 1.4GB. Spark UI stores UI data 
for jobs/stages/executors as well.  So to store 10k queries, it may take more 
than 14GB.
 * SHS has to parse JSON format event log for the initial start.  The 
uncompressed event logs can be as big as a few GBs, and the parse can be quite 
slow. Some users reported they had to wait for more than half an hour.

 

The proposal is to:
 # Store all the live UI data in local RocksDB with protobuf serialization.
 # The RocksDB files of live UI can be used on SHS directly.
 # If the RocksDB file is unavailable for SHS, event logs can be written with 
protobuf for faster replay.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to