✅ *"Thanks, Matei. ✅ Looks like a plan!* *📌 We resurrected the old thread! *
*https://lists.apache.org/thread/wwjyp1bhryvx7ytooj1lqtd8kgzxb6vq <https://lists.apache.org/thread/wwjyp1bhryvx7ytooj1lqtd8kgzxb6vq>* 🔗 Hopefully, there will be more traction this round. HTH Dr Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> On Wed, 12 Feb 2025 at 19:40, Matei Zaharia <matei.zaha...@gmail.com> wrote: > I posted the report, but thanks for the feedback. Hopefully we can get > enough coverage for DRA and the UI issues. > > On Feb 11, 2025, at 2:33 AM, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > > Let us carry on on that thread. > > Need to catch-up > > HTH > > Dr Mich Talebzadeh, > Architect | Data Science | Financial Crime | Forensic Analysis | GDPR > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > > > On Tue, 11 Feb 2025 at 06:01, Pavan Kotikalapudi <pkotikalap...@twilio.com> > wrote: > >> Hi Adam, >> >> Thanks for bringing up this initiative again to spark committers. I can >> resonate with that. It has been close to 2 years since this feature is >> operational for us(internally) and has been waiting in the apache/spark >> codebase for some love. >> >> It has soo many people (non-committers) interested in having this feature >> delivered, someone I know also has already patched it up in their company >> and has been also running huge workloads. I understand that is not the >> ideal way of doing it, but If you are interested do let me know, I can help >> with that (we internally have integrated it into 3 major spark releases!). >> >> Jungtaek, I appreciate that you have already given some insights on this >> feature and what other cases needs to be handled (I have covered that case >> as well). I can help with building some context on DRA in core ( This is my >> first contribution to spark, I could easily build context on it) We just >> have to understand ExecutorAllocationManager.scala >> <https://github.com/pkotikalapudi/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala>. >> *As you suggested, If there are other committers of the project who >> already understand the DRA and CORE (or willing to spend some time to >> understand it), please help in shipping this feature.* >> btw, DRA just uses ExecutorAllocationClient interface >> <https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala>. >> So it should help us cover all the resource managers ( I have personally >> run it in standalone and k8s). >> >> as Mich and Jungtaek suggested let's move the discussion to this thread >> <https://lists.apache.org/thread/wpvtvf4w3zygtkfgq4sthbf00y5pqxvr>, or >> create a new one so that other committers can also pitch in. >> >> Thank you, >> >> Pavan >> >> On Tue, Feb 11, 2025 at 4:46 AM Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> Hi all, >>> >>> ok this DRA has already had a thread here >>> >>> Vote on Dynamic resource allocation for structured streaming >>> [SPARK-24815] >>> >>> I recall I asked a committer to open the PR and it was opened and >>> closed.because of inactivity. Pavan Kotikalapudi was working on it >>> >>> Happy to chip in and help where I can >>> >>> HTH >>> >>> Dr Mich Talebzadeh, >>> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR >>> >>> view my Linkedin profile >>> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!YXSASsb33aQQkr6jr4rAJPRgufyWUaeZKJ6Y9whE7_vKZ7Ilt2CWmkZPTz8C7KcNXLTfJdwPWYZBgsFuicARlkuosfiKcQ$> >>> >>> >>> >>> >>> >>> On Mon, 10 Feb 2025 at 23:05, Jungtaek Lim <kabhwan.opensou...@gmail.com> >>> wrote: >>> >>>> Let's move the discussion to the other thread, as it's not relevant to >>>> the board report. >>>> >>>> tl;dr. Spark has a crazily large codebase and has multiple layers. SS >>>> is on top of SQL and SQL is on top of CORE. DRA is bound to CORE, >>>> especially used for specific resource managers like YARN (maybe we had >>>> dealt with K8S, I don't know.) There are not many people who are experts on >>>> multiple layers; I'm expert on SS and have an understanding of SQL and very >>>> essential of CORE, but DRA is definitely an advanced topic in CORE. >>>> >>>> It's not because I don't want to incorporate it. I'm lacking knowledge >>>> to cover that, and I failed to find anyone to do this. Just wanted to >>>> clarify. >>>> >>>> On Tue, Feb 11, 2025 at 7:56 AM Adam Hobbs < >>>> adam.ho...@bendigoadelaide.com.au> wrote: >>>> >>>>> Thanks for the reply. I am not fussed how the situation is addressed >>>>> really, but I am just trying to keep the initiative alive. This isn’t the >>>>> first time I have tried to rescue it. >>>>> >>>>> The feature would deliver great cost savings and possibly greater >>>>> performance for my use case. >>>>> >>>>> After the disappointment of seeing the github PR closed due to >>>>> inactivity I was unsure how to re-ignite things and it stuck me that maybe >>>>> ASF board report may be a way to highlight the issue. >>>>> >>>>> I understand that Structured streaming isn’t maybe the most common use >>>>> case for spark and that spark in of it self is more of a batch centric >>>>> technology, however I strongly believe that DRA in the long lived >>>>> streaming >>>>> context is possibly even more important than DRA in batch context. >>>>> Running >>>>> a large Hadoop/spark cluster 24x7 is expensive and could really benefit >>>>> from the functionality that proper streaming work load based DRA could >>>>> bring. >>>>> >>>>> Also, knowing that the PR author has been running this DRA code in his >>>>> own environment for quite some time now successfully, makes it more >>>>> frustrating. The code has essentially been tested externally before the >>>>> PR >>>>> was even raised. It seems to be more than just a theoretical improvement >>>>> to the codebase. >>>>> >>>>> >>>>> >>>>> Regards, >>>>> >>>>> >>>>> >>>>> Adam Hobbs >>>>> >>>>> >>>>> >>>>> >>>>> C2 - Internal Use >>>>> From: Jungtaek Lim <kabhwan.opensou...@gmail.com> >>>>> *Sent:* Tuesday, 11 February 2025 8:49 AM >>>>> *To:* adam.ho...@bendigoadelaide.com.au.invalid >>>>> *Cc:* Matei Zaharia <matei.zaha...@gmail.com>; Spark dev list < >>>>> dev@spark.apache.org> >>>>> *Subject:* Re: ASF board report draft for February 2025 >>>>> >>>>> >>>>> >>>>> CAUTION: This email originated from outside of the organisation. Do >>>>> not click links or open attachments unless you recognise the sender's full >>>>> email address and know the content is safe. >>>>> >>>>> >>>>> >>>>> Thanks Adam for your email. >>>>> >>>>> >>>>> >>>>> I started to look at these changes when proposed but I am not familiar >>>>> with DRA. It needed a non-trivial context building for me to be effective >>>>> which I could not prioritize. I asked my team members to also review and >>>>> they were involved, but even they lacked context on how DRA works, its >>>>> long >>>>> term supportability and maintainability. >>>>> >>>>> >>>>> >>>>> When possible I shepherd other initiatives (SPIP), such as Arbitrary >>>>> state processing API. If in the community there are folks who understand >>>>> DRA, its implications in terms of maintenance it will be nice for them to >>>>> share the load and shepherd the project. >>>>> >>>>> >>>>> >>>>> In any case, this seems to be a prioritization conversation that can >>>>> perhaps be taken in another thread and not block this ASF board report. Is >>>>> that ok for you? >>>>> >>>>> >>>>> >>>>> On Thu, Feb 6, 2025 at 2:30 PM Adam Hobbs < >>>>> adam.ho...@bendigoadelaide.com.au.invalid> wrote: >>>>> >>>>> I'd like to add something around the failure to get any traction on >>>>> shepparding of the structured streaming DRA PR. Multiple times now there >>>>> have been calls for help to get this initiative over the line and the >>>>> response has been disappointing. The github PR has been closed due to >>>>> inaction (https://github.com/apache/spark/pull/42352 >>>>> <https://urldefense.com/v3/__https:/github.com/apache/spark/pull/42352__;!!OkoFT9xN!PCzDhELZksixXIrHSFOlAGsgyEuE_NVULgxNonSd-HZD1Zd33au7gPaYFH2JxcnQBEfr-Mp5F7YlJrk_iWBA9P4Y8Pbnc4iXNMYs$> >>>>> ). >>>>> >>>>> This seems like a bit of a failure in the process >>>>> . >>>>> Regards, >>>>> >>>>> Adam Hobbs >>>>> >>>>> >>>>> C2 - Internal Use >>>>> -----Original Message----- >>>>> From: Matei Zaharia <matei.zaha...@gmail.com> >>>>> Sent: Thursday, 6 February 2025 2:57 PM >>>>> To: Spark dev list <dev@spark.apache.org> >>>>> Cc: priv...@spark.apache.org >>>>> Subject: ASF board report draft for February 2025 >>>>> >>>>> CAUTION: This email originated from outside of the organisation. Do >>>>> not click links or open attachments unless you recognise the sender's full >>>>> email address and know the content is safe. >>>>> >>>>> >>>>> It’s time to send our next ASF board report again on February 12th. >>>>> Here’s an initial draft — feel free to suggest changes: >>>>> >>>>> ===================== >>>>> >>>>> >>>>> Description: >>>>> >>>>> Apache Spark is a fast and general purpose engine for large-scale data >>>>> processing. It offers high-level APIs in Java, Scala, Python, R and SQL as >>>>> well as a rich set of libraries including stream processing, machine >>>>> learning, and graph analytics. >>>>> >>>>> Issues for the board: >>>>> >>>>> - None >>>>> >>>>> Project status: >>>>> >>>>> - The Spark 4.0 branch has been cut and has entered the QA stage. We >>>>> encourage the community to test it out! >>>>> - We released Spark 3.5.4 on December 20th, 2024. >>>>> - The PMC voted to add one new committer (Bingkun Pan) and one new PMC >>>>> member (Jie Yang) to the project. >>>>> - The proposal to "Use plain text logs by default" was successfully >>>>> passed. >>>>> >>>>> Trademarks: >>>>> >>>>> - No changes since last report. >>>>> >>>>> Latest releases: >>>>> >>>>> - Spark 3.5.4 was released on Dec 20, 2024 >>>>> - Spark 3.4.4 was released on Oct 27, 2024 >>>>> - Spark 4.0 Preview 2 was released on Sept 26, 2024 >>>>> >>>>> Committers and PMC: >>>>> >>>>> - The latest committer was added on Nov 13, 2024 (Bingkun Pan). >>>>> - The latest PMC member was added on Jan 21st, 2025 (Jie Yang). >>>>> >>>>> ===================== >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>> >>>>> >>>>> ******************************************************************************** >>>>> >>>>> This communication is intended only for use of the addressee and may >>>>> contain legally privileged and confidential information. >>>>> If you are not the addressee or intended recipient, you are notified >>>>> that any dissemination, copying or use of any of the information is >>>>> unauthorised. >>>>> >>>>> The legal privilege and confidentiality attached to this e-mail is not >>>>> waived, lost or destroyed by reason of a mistaken delivery to you. >>>>> If you have received this message in error, we would appreciate an >>>>> immediate notification via e-mail to contac...@bendigoadelaide.com.au >>>>> or by phoning 1300 BENDIGO (1300 236 344), and ask that the e-mail be >>>>> permanently deleted from your system. >>>>> >>>>> Bendigo and Adelaide Bank Limited ABN 11 068 049 178 >>>>> >>>>> >>>>> ******************************************************************************** >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>> >>>>> >>>>> ******************************************************************************** >>>>> >>>>> This communication is intended only for use of the addressee and may >>>>> contain legally privileged and confidential information. >>>>> If you are not the addressee or intended recipient, you are notified >>>>> that any dissemination, copying or use of any of the information is >>>>> unauthorised. >>>>> >>>>> The legal privilege and confidentiality attached to this e-mail is not >>>>> waived, lost or destroyed by reason of a mistaken delivery to you. >>>>> If you have received this message in error, we would appreciate an >>>>> immediate notification via e-mail to contac...@bendigoadelaide.com.au >>>>> or by phoning 1300 BENDIGO (1300 236 344), and ask that the e-mail be >>>>> permanently deleted from your system. >>>>> >>>>> Bendigo and Adelaide Bank Limited ABN 11 068 049 178 >>>>> >>>>> >>>>> ******************************************************************************** >>>>> >>>> >