Hi all, ok this DRA has already had a thread here
Vote on Dynamic resource allocation for structured streaming [SPARK-24815] I recall I asked a committer to open the PR and it was opened and closed.because of inactivity. Pavan Kotikalapudi was working on it Happy to chip in and help where I can HTH Dr Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> On Mon, 10 Feb 2025 at 23:05, Jungtaek Lim <kabhwan.opensou...@gmail.com> wrote: > Let's move the discussion to the other thread, as it's not relevant to the > board report. > > tl;dr. Spark has a crazily large codebase and has multiple layers. SS is > on top of SQL and SQL is on top of CORE. DRA is bound to CORE, especially > used for specific resource managers like YARN (maybe we had dealt with K8S, > I don't know.) There are not many people who are experts on multiple > layers; I'm expert on SS and have an understanding of SQL and very > essential of CORE, but DRA is definitely an advanced topic in CORE. > > It's not because I don't want to incorporate it. I'm lacking knowledge to > cover that, and I failed to find anyone to do this. Just wanted to clarify. > > On Tue, Feb 11, 2025 at 7:56 AM Adam Hobbs < > adam.ho...@bendigoadelaide.com.au> wrote: > >> Thanks for the reply. I am not fussed how the situation is addressed >> really, but I am just trying to keep the initiative alive. This isn’t the >> first time I have tried to rescue it. >> >> The feature would deliver great cost savings and possibly greater >> performance for my use case. >> >> After the disappointment of seeing the github PR closed due to inactivity >> I was unsure how to re-ignite things and it stuck me that maybe ASF board >> report may be a way to highlight the issue. >> >> I understand that Structured streaming isn’t maybe the most common use >> case for spark and that spark in of it self is more of a batch centric >> technology, however I strongly believe that DRA in the long lived streaming >> context is possibly even more important than DRA in batch context. Running >> a large Hadoop/spark cluster 24x7 is expensive and could really benefit >> from the functionality that proper streaming work load based DRA could >> bring. >> >> Also, knowing that the PR author has been running this DRA code in his >> own environment for quite some time now successfully, makes it more >> frustrating. The code has essentially been tested externally before the PR >> was even raised. It seems to be more than just a theoretical improvement >> to the codebase. >> >> >> >> Regards, >> >> >> >> Adam Hobbs >> >> >> >> >> C2 - Internal Use >> From: Jungtaek Lim <kabhwan.opensou...@gmail.com> >> *Sent:* Tuesday, 11 February 2025 8:49 AM >> *To:* adam.ho...@bendigoadelaide.com.au.invalid >> *Cc:* Matei Zaharia <matei.zaha...@gmail.com>; Spark dev list < >> dev@spark.apache.org> >> *Subject:* Re: ASF board report draft for February 2025 >> >> >> >> CAUTION: This email originated from outside of the organisation. Do not >> click links or open attachments unless you recognise the sender's full >> email address and know the content is safe. >> >> >> >> Thanks Adam for your email. >> >> >> >> I started to look at these changes when proposed but I am not familiar >> with DRA. It needed a non-trivial context building for me to be effective >> which I could not prioritize. I asked my team members to also review and >> they were involved, but even they lacked context on how DRA works, its long >> term supportability and maintainability. >> >> >> >> When possible I shepherd other initiatives (SPIP), such as Arbitrary >> state processing API. If in the community there are folks who understand >> DRA, its implications in terms of maintenance it will be nice for them to >> share the load and shepherd the project. >> >> >> >> In any case, this seems to be a prioritization conversation that can >> perhaps be taken in another thread and not block this ASF board report. Is >> that ok for you? >> >> >> >> On Thu, Feb 6, 2025 at 2:30 PM Adam Hobbs < >> adam.ho...@bendigoadelaide.com.au.invalid> wrote: >> >> I'd like to add something around the failure to get any traction on >> shepparding of the structured streaming DRA PR. Multiple times now there >> have been calls for help to get this initiative over the line and the >> response has been disappointing. The github PR has been closed due to >> inaction (https://github.com/apache/spark/pull/42352 >> <https://urldefense.com/v3/__https:/github.com/apache/spark/pull/42352__;!!OkoFT9xN!PCzDhELZksixXIrHSFOlAGsgyEuE_NVULgxNonSd-HZD1Zd33au7gPaYFH2JxcnQBEfr-Mp5F7YlJrk_iWBA9P4Y8Pbnc4iXNMYs$> >> ). >> >> This seems like a bit of a failure in the process >> . >> Regards, >> >> Adam Hobbs >> >> >> C2 - Internal Use >> -----Original Message----- >> From: Matei Zaharia <matei.zaha...@gmail.com> >> Sent: Thursday, 6 February 2025 2:57 PM >> To: Spark dev list <dev@spark.apache.org> >> Cc: priv...@spark.apache.org >> Subject: ASF board report draft for February 2025 >> >> CAUTION: This email originated from outside of the organisation. Do not >> click links or open attachments unless you recognise the sender's full >> email address and know the content is safe. >> >> >> It’s time to send our next ASF board report again on February 12th. >> Here’s an initial draft — feel free to suggest changes: >> >> ===================== >> >> >> Description: >> >> Apache Spark is a fast and general purpose engine for large-scale data >> processing. It offers high-level APIs in Java, Scala, Python, R and SQL as >> well as a rich set of libraries including stream processing, machine >> learning, and graph analytics. >> >> Issues for the board: >> >> - None >> >> Project status: >> >> - The Spark 4.0 branch has been cut and has entered the QA stage. We >> encourage the community to test it out! >> - We released Spark 3.5.4 on December 20th, 2024. >> - The PMC voted to add one new committer (Bingkun Pan) and one new PMC >> member (Jie Yang) to the project. >> - The proposal to "Use plain text logs by default" was successfully >> passed. >> >> Trademarks: >> >> - No changes since last report. >> >> Latest releases: >> >> - Spark 3.5.4 was released on Dec 20, 2024 >> - Spark 3.4.4 was released on Oct 27, 2024 >> - Spark 4.0 Preview 2 was released on Sept 26, 2024 >> >> Committers and PMC: >> >> - The latest committer was added on Nov 13, 2024 (Bingkun Pan). >> - The latest PMC member was added on Jan 21st, 2025 (Jie Yang). >> >> ===================== >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> >> ******************************************************************************** >> >> This communication is intended only for use of the addressee and may >> contain legally privileged and confidential information. >> If you are not the addressee or intended recipient, you are notified that >> any dissemination, copying or use of any of the information is unauthorised. >> >> The legal privilege and confidentiality attached to this e-mail is not >> waived, lost or destroyed by reason of a mistaken delivery to you. >> If you have received this message in error, we would appreciate an >> immediate notification via e-mail to contac...@bendigoadelaide.com.au or >> by phoning 1300 BENDIGO (1300 236 344), and ask that the e-mail be >> permanently deleted from your system. >> >> Bendigo and Adelaide Bank Limited ABN 11 068 049 178 >> >> >> ******************************************************************************** >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> >> ******************************************************************************** >> >> This communication is intended only for use of the addressee and may >> contain legally privileged and confidential information. >> If you are not the addressee or intended recipient, you are notified that >> any dissemination, copying or use of any of the information is unauthorised. >> >> The legal privilege and confidentiality attached to this e-mail is not >> waived, lost or destroyed by reason of a mistaken delivery to you. >> If you have received this message in error, we would appreciate an >> immediate notification via e-mail to contac...@bendigoadelaide.com.au or >> by phoning 1300 BENDIGO (1300 236 344), and ask that the e-mail be >> permanently deleted from your system. >> >> Bendigo and Adelaide Bank Limited ABN 11 068 049 178 >> >> >> ******************************************************************************** >> >