I think I'm still missing something in the big picture: - Is the memory overhead off-heap? The formular indicates a fixed heap size, and memory overhead can't be dynamic if it's on-heap. - Do Spark applications have static profiles? When we submit stages, the cluster is already allocated, how can we change anything? - How do we assign the shared memory overhead? Fairly among all applications on the same physical node?
On Tue, Dec 9, 2025 at 2:15 PM Nan Zhu <[email protected]> wrote: > we didn't separate the design into another doc since the main idea is > relatively simple... > > for request/limit calculation, I described it in Q4 of the SPIP doc > https://docs.google.com/document/d/1v5PQel1ygVayBFS8rdtzIH8l1el6H1TDjULD3EyBeIc/edit?tab=t.0#heading=h.q4vjslmnfuo0 > > it is calculated based on per profile (you can say it is based on per > stage), when the cluster manager compose the pod spec, it calculates the > new memory overhead based on what user asks for in that resource profile > > On Mon, Dec 8, 2025 at 9:49 PM Wenchen Fan <[email protected]> wrote: > >> Do we have a design sketch? How to determine the memory request and >> limit? Is it per stage or per executor? >> >> On Tue, Dec 9, 2025 at 1:40 PM Nan Zhu <[email protected]> wrote: >> >>> yeah, the implementation is basically relying on the request/limit >>> concept in K8S, ... >>> >>> but if there is any other cluster manager coming in future, as long as >>> it has a similar concept , it can leverage this easily as the main logic is >>> implemented in ResourceProfile >>> >>> On Mon, Dec 8, 2025 at 9:34 PM Wenchen Fan <[email protected]> wrote: >>> >>>> This feature is only available on k8s because it allows containers to >>>> have dynamic resources? >>>> >>>> On Mon, Dec 8, 2025 at 12:46 PM Yao <[email protected]> wrote: >>>> >>>>> Hi Folks, >>>>> >>>>> We are proposing a burst-aware memoryOverhead allocation algorithm for >>>>> Spark@K8S to improve memory utilization of spark clusters. >>>>> Please see more details in SPIP doc >>>>> <https://docs.google.com/document/d/1v5PQel1ygVayBFS8rdtzIH8l1el6H1TDjULD3EyBeIc/edit?tab=t.0>. >>>>> Feedbacks and discussions are welcomed. >>>>> >>>>> Thanks Chao for being shepard of this feature. >>>>> Also want to thank the authors of the original paper >>>>> <https://www.vldb.org/pvldb/vol17/p3759-shi.pdf> from ByteDance, >>>>> specifically Rui([email protected]) and Yixin( >>>>> [email protected]). >>>>> >>>>> Thank you. >>>>> Yao Wang >>>>> >>>>
