Hi Adam, Thanks for bringing up this initiative again to spark committers. I can resonate with that. It has been close to 2 years since this feature is operational for us(internally) and has been waiting in the apache/spark codebase for some love.
It has soo many people (non-committers) interested in having this feature delivered, someone I know also has already patched it up in their company and has been also running huge workloads. I understand that is not the ideal way of doing it, but If you are interested do let me know, I can help with that (we internally have integrated it into 3 major spark releases!). Jungtaek, I appreciate that you have already given some insights on this feature and what other cases needs to be handled (I have covered that case as well). I can help with building some context on DRA in core ( This is my first contribution to spark, I could easily build context on it) We just have to understand ExecutorAllocationManager.scala <https://github.com/pkotikalapudi/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala>. *As you suggested, If there are other committers of the project who already understand the DRA and CORE (or willing to spend some time to understand it), please help in shipping this feature.* btw, DRA just uses ExecutorAllocationClient interface <https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala>. So it should help us cover all the resource managers ( I have personally run it in standalone and k8s). as Mich and Jungtaek suggested let's move the discussion to this thread <https://lists.apache.org/thread/wpvtvf4w3zygtkfgq4sthbf00y5pqxvr>, or create a new one so that other committers can also pitch in. Thank you, Pavan On Tue, Feb 11, 2025 at 4:46 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Hi all, > > ok this DRA has already had a thread here > > Vote on Dynamic resource allocation for structured streaming [SPARK-24815] > > I recall I asked a committer to open the PR and it was opened and > closed.because of inactivity. Pavan Kotikalapudi was working on it > > Happy to chip in and help where I can > > HTH > > Dr Mich Talebzadeh, > Architect | Data Science | Financial Crime | Forensic Analysis | GDPR > > view my Linkedin profile > <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!YXSASsb33aQQkr6jr4rAJPRgufyWUaeZKJ6Y9whE7_vKZ7Ilt2CWmkZPTz8C7KcNXLTfJdwPWYZBgsFuicARlkuosfiKcQ$> > > > > > > On Mon, 10 Feb 2025 at 23:05, Jungtaek Lim <kabhwan.opensou...@gmail.com> > wrote: > >> Let's move the discussion to the other thread, as it's not relevant to >> the board report. >> >> tl;dr. Spark has a crazily large codebase and has multiple layers. SS is >> on top of SQL and SQL is on top of CORE. DRA is bound to CORE, especially >> used for specific resource managers like YARN (maybe we had dealt with K8S, >> I don't know.) There are not many people who are experts on multiple >> layers; I'm expert on SS and have an understanding of SQL and very >> essential of CORE, but DRA is definitely an advanced topic in CORE. >> >> It's not because I don't want to incorporate it. I'm lacking knowledge to >> cover that, and I failed to find anyone to do this. Just wanted to clarify. >> >> On Tue, Feb 11, 2025 at 7:56 AM Adam Hobbs < >> adam.ho...@bendigoadelaide.com.au> wrote: >> >>> Thanks for the reply. I am not fussed how the situation is addressed >>> really, but I am just trying to keep the initiative alive. This isn’t the >>> first time I have tried to rescue it. >>> >>> The feature would deliver great cost savings and possibly greater >>> performance for my use case. >>> >>> After the disappointment of seeing the github PR closed due to >>> inactivity I was unsure how to re-ignite things and it stuck me that maybe >>> ASF board report may be a way to highlight the issue. >>> >>> I understand that Structured streaming isn’t maybe the most common use >>> case for spark and that spark in of it self is more of a batch centric >>> technology, however I strongly believe that DRA in the long lived streaming >>> context is possibly even more important than DRA in batch context. Running >>> a large Hadoop/spark cluster 24x7 is expensive and could really benefit >>> from the functionality that proper streaming work load based DRA could >>> bring. >>> >>> Also, knowing that the PR author has been running this DRA code in his >>> own environment for quite some time now successfully, makes it more >>> frustrating. The code has essentially been tested externally before the PR >>> was even raised. It seems to be more than just a theoretical improvement >>> to the codebase. >>> >>> >>> >>> Regards, >>> >>> >>> >>> Adam Hobbs >>> >>> >>> >>> >>> C2 - Internal Use >>> From: Jungtaek Lim <kabhwan.opensou...@gmail.com> >>> *Sent:* Tuesday, 11 February 2025 8:49 AM >>> *To:* adam.ho...@bendigoadelaide.com.au.invalid >>> *Cc:* Matei Zaharia <matei.zaha...@gmail.com>; Spark dev list < >>> dev@spark.apache.org> >>> *Subject:* Re: ASF board report draft for February 2025 >>> >>> >>> >>> CAUTION: This email originated from outside of the organisation. Do not >>> click links or open attachments unless you recognise the sender's full >>> email address and know the content is safe. >>> >>> >>> >>> Thanks Adam for your email. >>> >>> >>> >>> I started to look at these changes when proposed but I am not familiar >>> with DRA. It needed a non-trivial context building for me to be effective >>> which I could not prioritize. I asked my team members to also review and >>> they were involved, but even they lacked context on how DRA works, its long >>> term supportability and maintainability. >>> >>> >>> >>> When possible I shepherd other initiatives (SPIP), such as Arbitrary >>> state processing API. If in the community there are folks who understand >>> DRA, its implications in terms of maintenance it will be nice for them to >>> share the load and shepherd the project. >>> >>> >>> >>> In any case, this seems to be a prioritization conversation that can >>> perhaps be taken in another thread and not block this ASF board report. Is >>> that ok for you? >>> >>> >>> >>> On Thu, Feb 6, 2025 at 2:30 PM Adam Hobbs < >>> adam.ho...@bendigoadelaide.com.au.invalid> wrote: >>> >>> I'd like to add something around the failure to get any traction on >>> shepparding of the structured streaming DRA PR. Multiple times now there >>> have been calls for help to get this initiative over the line and the >>> response has been disappointing. The github PR has been closed due to >>> inaction (https://github.com/apache/spark/pull/42352 >>> <https://urldefense.com/v3/__https:/github.com/apache/spark/pull/42352__;!!OkoFT9xN!PCzDhELZksixXIrHSFOlAGsgyEuE_NVULgxNonSd-HZD1Zd33au7gPaYFH2JxcnQBEfr-Mp5F7YlJrk_iWBA9P4Y8Pbnc4iXNMYs$> >>> ). >>> >>> This seems like a bit of a failure in the process >>> . >>> Regards, >>> >>> Adam Hobbs >>> >>> >>> C2 - Internal Use >>> -----Original Message----- >>> From: Matei Zaharia <matei.zaha...@gmail.com> >>> Sent: Thursday, 6 February 2025 2:57 PM >>> To: Spark dev list <dev@spark.apache.org> >>> Cc: priv...@spark.apache.org >>> Subject: ASF board report draft for February 2025 >>> >>> CAUTION: This email originated from outside of the organisation. Do not >>> click links or open attachments unless you recognise the sender's full >>> email address and know the content is safe. >>> >>> >>> It’s time to send our next ASF board report again on February 12th. >>> Here’s an initial draft — feel free to suggest changes: >>> >>> ===================== >>> >>> >>> Description: >>> >>> Apache Spark is a fast and general purpose engine for large-scale data >>> processing. It offers high-level APIs in Java, Scala, Python, R and SQL as >>> well as a rich set of libraries including stream processing, machine >>> learning, and graph analytics. >>> >>> Issues for the board: >>> >>> - None >>> >>> Project status: >>> >>> - The Spark 4.0 branch has been cut and has entered the QA stage. We >>> encourage the community to test it out! >>> - We released Spark 3.5.4 on December 20th, 2024. >>> - The PMC voted to add one new committer (Bingkun Pan) and one new PMC >>> member (Jie Yang) to the project. >>> - The proposal to "Use plain text logs by default" was successfully >>> passed. >>> >>> Trademarks: >>> >>> - No changes since last report. >>> >>> Latest releases: >>> >>> - Spark 3.5.4 was released on Dec 20, 2024 >>> - Spark 3.4.4 was released on Oct 27, 2024 >>> - Spark 4.0 Preview 2 was released on Sept 26, 2024 >>> >>> Committers and PMC: >>> >>> - The latest committer was added on Nov 13, 2024 (Bingkun Pan). >>> - The latest PMC member was added on Jan 21st, 2025 (Jie Yang). >>> >>> ===================== >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>> >>> ******************************************************************************** >>> >>> This communication is intended only for use of the addressee and may >>> contain legally privileged and confidential information. >>> If you are not the addressee or intended recipient, you are notified >>> that any dissemination, copying or use of any of the information is >>> unauthorised. >>> >>> The legal privilege and confidentiality attached to this e-mail is not >>> waived, lost or destroyed by reason of a mistaken delivery to you. >>> If you have received this message in error, we would appreciate an >>> immediate notification via e-mail to contac...@bendigoadelaide.com.au >>> or by phoning 1300 BENDIGO (1300 236 344), and ask that the e-mail be >>> permanently deleted from your system. >>> >>> Bendigo and Adelaide Bank Limited ABN 11 068 049 178 >>> >>> >>> ******************************************************************************** >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>> >>> ******************************************************************************** >>> >>> This communication is intended only for use of the addressee and may >>> contain legally privileged and confidential information. >>> If you are not the addressee or intended recipient, you are notified >>> that any dissemination, copying or use of any of the information is >>> unauthorised. >>> >>> The legal privilege and confidentiality attached to this e-mail is not >>> waived, lost or destroyed by reason of a mistaken delivery to you. >>> If you have received this message in error, we would appreciate an >>> immediate notification via e-mail to contac...@bendigoadelaide.com.au >>> or by phoning 1300 BENDIGO (1300 236 344), and ask that the e-mail be >>> permanently deleted from your system. >>> >>> Bendigo and Adelaide Bank Limited ABN 11 068 049 178 >>> >>> >>> ******************************************************************************** >>> >>