Hi all,

ok this DRA has already had a thread here

Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

I recall I asked a committer to open the PR and it was opened and
closed.because of inactivity. Pavan Kotikalapudi was working on it

Happy to chip in and help where I can

HTH

Dr Mich Talebzadeh,
Architect | Data Science | Financial Crime | Forensic Analysis | GDPR

   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>





On Mon, 10 Feb 2025 at 23:05, Jungtaek Lim <kabhwan.opensou...@gmail.com>
wrote:

> Let's move the discussion to the other thread, as it's not relevant to the
> board report.
>
> tl;dr. Spark has a crazily large codebase and has multiple layers. SS is
> on top of SQL and SQL is on top of CORE. DRA is bound to CORE, especially
> used for specific resource managers like YARN (maybe we had dealt with K8S,
> I don't know.) There are not many people who are experts on multiple
> layers; I'm expert on SS and have an understanding of SQL and very
> essential of CORE, but DRA is definitely an advanced topic in CORE.
>
> It's not because I don't want to incorporate it. I'm lacking knowledge to
> cover that, and I failed to find anyone to do this. Just wanted to clarify.
>
> On Tue, Feb 11, 2025 at 7:56 AM Adam Hobbs <
> adam.ho...@bendigoadelaide.com.au> wrote:
>
>> Thanks for the reply.  I am not fussed how the situation is addressed
>> really, but I am just trying to keep the initiative alive.  This isn’t the
>> first time I have tried to rescue it.
>>
>> The feature would deliver great cost savings and possibly greater
>> performance for my use case.
>>
>> After the disappointment of seeing the github PR closed due to inactivity
>> I was unsure how to re-ignite things and it stuck me that maybe ASF board
>> report may be a way to highlight the issue.
>>
>> I understand that Structured streaming isn’t maybe the most common use
>> case for spark and that spark in of it self is more of a batch centric
>> technology, however I strongly believe that DRA in the long lived streaming
>> context is possibly even more important than DRA in batch context.  Running
>> a large Hadoop/spark cluster 24x7 is expensive and could really benefit
>> from the functionality that proper streaming work load based DRA could
>> bring.
>>
>> Also, knowing that the PR author has been running this DRA code in his
>> own environment for quite some time now successfully, makes it more
>> frustrating.  The code has essentially been tested externally before the PR
>> was even raised.  It seems to be more than just a theoretical improvement
>> to the codebase.
>>
>>
>>
>> Regards,
>>
>>
>>
>> Adam Hobbs
>>
>>
>>
>>
>> C2 - Internal Use
>> From: Jungtaek Lim <kabhwan.opensou...@gmail.com>
>> *Sent:* Tuesday, 11 February 2025 8:49 AM
>> *To:* adam.ho...@bendigoadelaide.com.au.invalid
>> *Cc:* Matei Zaharia <matei.zaha...@gmail.com>; Spark dev list <
>> dev@spark.apache.org>
>> *Subject:* Re: ASF board report draft for February 2025
>>
>>
>>
>> CAUTION: This email originated from outside of the organisation. Do not
>> click links or open attachments unless you recognise the sender's full
>> email address and know the content is safe.
>>
>>
>>
>> Thanks Adam for your email.
>>
>>
>>
>> I started to look at these changes when proposed but I am not familiar
>> with DRA. It needed a non-trivial context building for me to be effective
>> which I could not prioritize. I asked my team members to also review and
>> they were involved, but even they lacked context on how DRA works, its long
>> term supportability and maintainability.
>>
>>
>>
>> When possible I shepherd other initiatives (SPIP), such as Arbitrary
>> state processing API. If in the community there are folks who understand
>> DRA, its implications in terms of maintenance it will be nice for them to
>> share the load and shepherd the project.
>>
>>
>>
>> In any case, this seems to be a prioritization conversation that can
>> perhaps be taken in another thread and not block this ASF board report. Is
>> that ok for you?
>>
>>
>>
>> On Thu, Feb 6, 2025 at 2:30 PM Adam Hobbs <
>> adam.ho...@bendigoadelaide.com.au.invalid> wrote:
>>
>> I'd like to add something around the failure to get any traction on
>> shepparding of the structured streaming DRA PR.  Multiple times now there
>> have been calls for help to get this initiative over the line and the
>> response has been disappointing.  The github PR has been closed due to
>> inaction (https://github.com/apache/spark/pull/42352
>> <https://urldefense.com/v3/__https:/github.com/apache/spark/pull/42352__;!!OkoFT9xN!PCzDhELZksixXIrHSFOlAGsgyEuE_NVULgxNonSd-HZD1Zd33au7gPaYFH2JxcnQBEfr-Mp5F7YlJrk_iWBA9P4Y8Pbnc4iXNMYs$>
>> ).
>>
>> This seems like a bit of a failure in the process
>> .
>> Regards,
>>
>> Adam Hobbs
>>
>>
>> C2 - Internal Use
>> -----Original Message-----
>> From: Matei Zaharia <matei.zaha...@gmail.com>
>> Sent: Thursday, 6 February 2025 2:57 PM
>> To: Spark dev list <dev@spark.apache.org>
>> Cc: priv...@spark.apache.org
>> Subject: ASF board report draft for February 2025
>>
>> CAUTION: This email originated from outside of the organisation. Do not
>> click links or open attachments unless you recognise the sender's full
>> email address and know the content is safe.
>>
>>
>> It’s time to send our next ASF board report again on February 12th.
>> Here’s an initial draft — feel free to suggest changes:
>>
>> =====================
>>
>>
>> Description:
>>
>> Apache Spark is a fast and general purpose engine for large-scale data
>> processing. It offers high-level APIs in Java, Scala, Python, R and SQL as
>> well as a rich set of libraries including stream processing, machine
>> learning, and graph analytics.
>>
>> Issues for the board:
>>
>> - None
>>
>> Project status:
>>
>> - The Spark 4.0 branch has been cut and has entered the QA stage. We
>> encourage the community to test it out!
>> - We released Spark 3.5.4 on December 20th, 2024.
>> - The PMC voted to add one new committer (Bingkun Pan) and one new PMC
>> member (Jie Yang) to the project.
>> - The proposal to "Use plain text logs by default" was successfully
>> passed.
>>
>> Trademarks:
>>
>> - No changes since last report.
>>
>> Latest releases:
>>
>> - Spark 3.5.4 was released on Dec 20, 2024
>> - Spark 3.4.4 was released on Oct 27, 2024
>> - Spark 4.0 Preview 2 was released on Sept 26, 2024
>>
>> Committers and PMC:
>>
>> - The latest committer was added on Nov 13, 2024 (Bingkun Pan).
>> - The latest PMC member was added on Jan 21st, 2025 (Jie Yang).
>>
>> =====================
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>> ********************************************************************************
>>
>> This communication is intended only for use of the addressee and may
>> contain legally privileged and confidential information.
>> If you are not the addressee or intended recipient, you are notified that
>> any dissemination, copying or use of any of the information is unauthorised.
>>
>> The legal privilege and confidentiality attached to this e-mail is not
>> waived, lost or destroyed by reason of a mistaken delivery to you.
>> If you have received this message in error, we would appreciate an
>> immediate notification via e-mail to contac...@bendigoadelaide.com.au or
>> by phoning 1300 BENDIGO (1300 236 344), and ask that the e-mail be
>> permanently deleted from your system.
>>
>> Bendigo and Adelaide Bank Limited ABN 11 068 049 178
>>
>>
>> ********************************************************************************
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>> ********************************************************************************
>>
>> This communication is intended only for use of the addressee and may
>> contain legally privileged and confidential information.
>> If you are not the addressee or intended recipient, you are notified that
>> any dissemination, copying or use of any of the information is unauthorised.
>>
>> The legal privilege and confidentiality attached to this e-mail is not
>> waived, lost or destroyed by reason of a mistaken delivery to you.
>> If you have received this message in error, we would appreciate an
>> immediate notification via e-mail to contac...@bendigoadelaide.com.au or
>> by phoning 1300 BENDIGO (1300 236 344), and ask that the e-mail be
>> permanently deleted from your system.
>>
>> Bendigo and Adelaide Bank Limited ABN 11 068 049 178
>>
>>
>> ********************************************************************************
>>
>

Reply via email to