I think I'm still missing something in the big picture:

   - Is the memory overhead off-heap? The formular indicates a fixed heap
   size, and memory overhead can't be dynamic if it's on-heap.
   - Do Spark applications have static profiles? When we submit stages, the
   cluster is already allocated, how can we change anything?
   - How do we assign the shared memory overhead? Fairly among all
   applications on the same physical node?


On Tue, Dec 9, 2025 at 2:15 PM Nan Zhu <[email protected]> wrote:

> we didn't separate the design into another doc since the main idea is
> relatively simple...
>
> for request/limit calculation, I described it in Q4 of the SPIP doc
> https://docs.google.com/document/d/1v5PQel1ygVayBFS8rdtzIH8l1el6H1TDjULD3EyBeIc/edit?tab=t.0#heading=h.q4vjslmnfuo0
>
> it is calculated based on per profile (you can say it is based on per
> stage), when the cluster manager compose the pod spec, it calculates the
> new memory overhead based on what user asks for in that resource profile
>
> On Mon, Dec 8, 2025 at 9:49 PM Wenchen Fan <[email protected]> wrote:
>
>> Do we have a design sketch? How to determine the memory request and
>> limit? Is it per stage or per executor?
>>
>> On Tue, Dec 9, 2025 at 1:40 PM Nan Zhu <[email protected]> wrote:
>>
>>> yeah, the implementation is basically relying on the request/limit
>>> concept in K8S, ...
>>>
>>> but if there is any other cluster manager coming in future,  as long as
>>> it has a similar concept , it can leverage this easily as the main logic is
>>> implemented in ResourceProfile
>>>
>>> On Mon, Dec 8, 2025 at 9:34 PM Wenchen Fan <[email protected]> wrote:
>>>
>>>> This feature is only available on k8s because it allows containers to
>>>> have dynamic resources?
>>>>
>>>> On Mon, Dec 8, 2025 at 12:46 PM Yao <[email protected]> wrote:
>>>>
>>>>> Hi Folks,
>>>>>
>>>>> We are proposing a burst-aware memoryOverhead allocation algorithm for
>>>>> Spark@K8S to improve memory utilization of spark clusters.
>>>>> Please see more details in SPIP doc
>>>>> <https://docs.google.com/document/d/1v5PQel1ygVayBFS8rdtzIH8l1el6H1TDjULD3EyBeIc/edit?tab=t.0>.
>>>>> Feedbacks and discussions are welcomed.
>>>>>
>>>>> Thanks Chao for being shepard of this feature.
>>>>> Also want to thank the authors of the original paper
>>>>> <https://www.vldb.org/pvldb/vol17/p3759-shi.pdf> from ByteDance,
>>>>> specifically Rui([email protected]) and Yixin(
>>>>> [email protected]).
>>>>>
>>>>> Thank you.
>>>>> Yao Wang
>>>>>
>>>>

Reply via email to