Nan Zhu created SPARK-54596:
-------------------------------

             Summary: Burst-aware memoryOverhead Allocation Algorithm for 
Spark@K8S
                 Key: SPARK-54596
                 URL: https://issues.apache.org/jira/browse/SPARK-54596
             Project: Spark
          Issue Type: Improvement
          Components: Kubernetes, Spark Core
    Affects Versions: 4.2
            Reporter: Nan Zhu


memoryOverhead is one of the most significant memory consumers for Spark 
workloads. Users tend to book a big chunk of such memory space to avoid 
triggering OOMKiller in the cluster environment. 

However, the usage of pattern of memoryOverhead space, such as se/deser, 
compress/decompress, direct memory, etc. ,  is usually bursty. As a result, 
users have to book the peak usage of memoryOverhead to ensure the reliability 
of their jobs while leaving a big chunk of this space unused in the majority of 
the job lifecycle. 


To resolve this problem in the context of Spark@K8S, we implemented the 
algorithm presented in the paper "Towards Resource Efficiency: Practical 
Insights into Large-Scale Spark Workloads at ByteDance" from ByteDance team 
([https://www.vldb.org/pvldb/vol17/p3759-shi.pdf)] . We introduce a 
configurable parameter to split the user requested memoryOverhead space into 
two parts, guaranteed/shared. The guaranteed part is dedicated to a pod started 
by a spark job and shared part is used by multiple pods in a time-sharing 
fashion. Being specific to the K8S environment, the memory request of a pod 
will be user-requested on-heap space + guaranteed memoryOverhead and the memory 
limit will be user-requested on-heap space + guaranteed memoryOverhead + shared 
memoryOverhead .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to