Nan Zhu created SPARK-54596:
-------------------------------
Summary: Burst-aware memoryOverhead Allocation Algorithm for
Spark@K8S
Key: SPARK-54596
URL: https://issues.apache.org/jira/browse/SPARK-54596
Project: Spark
Issue Type: Improvement
Components: Kubernetes, Spark Core
Affects Versions: 4.2
Reporter: Nan Zhu
memoryOverhead is one of the most significant memory consumers for Spark
workloads. Users tend to book a big chunk of such memory space to avoid
triggering OOMKiller in the cluster environment.
However, the usage of pattern of memoryOverhead space, such as se/deser,
compress/decompress, direct memory, etc. , is usually bursty. As a result,
users have to book the peak usage of memoryOverhead to ensure the reliability
of their jobs while leaving a big chunk of this space unused in the majority of
the job lifecycle.
To resolve this problem in the context of Spark@K8S, we implemented the
algorithm presented in the paper "Towards Resource Efficiency: Practical
Insights into Large-Scale Spark Workloads at ByteDance" from ByteDance team
([https://www.vldb.org/pvldb/vol17/p3759-shi.pdf)] . We introduce a
configurable parameter to split the user requested memoryOverhead space into
two parts, guaranteed/shared. The guaranteed part is dedicated to a pod started
by a spark job and shared part is used by multiple pods in a time-sharing
fashion. Being specific to the K8S environment, the memory request of a pod
will be user-requested on-heap space + guaranteed memoryOverhead and the memory
limit will be user-requested on-heap space + guaranteed memoryOverhead + shared
memoryOverhead .
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]