Re: ASF board report draft for February 2025

Mich Talebzadeh Tue, 11 Feb 2025 02:34:44 -0800

Let us carry on on that thread.

Need to catch-up


HTH

Dr Mich Talebzadeh,
Architect | Data Science | Financial Crime | Forensic Analysis | GDPR

   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>





On Tue, 11 Feb 2025 at 06:01, Pavan Kotikalapudi <pkotikalap...@twilio.com>
wrote:

> Hi Adam,
>
> Thanks for bringing up this initiative again to spark committers. I can
> resonate with that. It has been close to 2 years since this feature is
> operational for us(internally) and has been waiting in the apache/spark
> codebase for some love.
>
> It has soo many people (non-committers) interested in having this feature
> delivered, someone I know also has already patched it up in their company
> and has been also running huge workloads. I understand that is not the
> ideal way of doing it, but If you are interested do let me know, I can help
> with that (we internally have integrated it into 3 major spark releases!).
>
>  Jungtaek, I appreciate that you have already given some insights on this
> feature and what other cases needs to be handled (I have covered that case
> as well). I can help with building some context on DRA in core ( This is my
> first contribution to spark, I could easily build context on it) We just
> have to understand ExecutorAllocationManager.scala
> <https://github.com/pkotikalapudi/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala>.
> *As you suggested, If there are other committers of the project who
> already understand the DRA and CORE (or willing to spend some time to
> understand it), please help in shipping this feature.*
> btw, DRA just uses ExecutorAllocationClient interface
> <https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala>.
> So it should help us cover all the resource managers ( I have personally
> run it in standalone and k8s).
>
> as Mich and Jungtaek suggested let's move the discussion to this thread
> <https://lists.apache.org/thread/wpvtvf4w3zygtkfgq4sthbf00y5pqxvr>, or
> create a new one so that other committers can also pitch in.
>
> Thank you,
>
> Pavan
>
> On Tue, Feb 11, 2025 at 4:46 AM Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> ok this DRA has already had a thread here
>>
>> Vote on Dynamic resource allocation for structured streaming
>> [SPARK-24815]
>>
>> I recall I asked a committer to open the PR and it was opened and
>> closed.because of inactivity. Pavan Kotikalapudi was working on it
>>
>> Happy to chip in and help where I can
>>
>> HTH
>>
>> Dr Mich Talebzadeh,
>> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR
>>
>>    view my Linkedin profile
>> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!YXSASsb33aQQkr6jr4rAJPRgufyWUaeZKJ6Y9whE7_vKZ7Ilt2CWmkZPTz8C7KcNXLTfJdwPWYZBgsFuicARlkuosfiKcQ$>
>>
>>
>>
>>
>>
>> On Mon, 10 Feb 2025 at 23:05, Jungtaek Lim <kabhwan.opensou...@gmail.com>
>> wrote:
>>
>>> Let's move the discussion to the other thread, as it's not relevant to
>>> the board report.
>>>
>>> tl;dr. Spark has a crazily large codebase and has multiple layers. SS is
>>> on top of SQL and SQL is on top of CORE. DRA is bound to CORE, especially
>>> used for specific resource managers like YARN (maybe we had dealt with K8S,
>>> I don't know.) There are not many people who are experts on multiple
>>> layers; I'm expert on SS and have an understanding of SQL and very
>>> essential of CORE, but DRA is definitely an advanced topic in CORE.
>>>
>>> It's not because I don't want to incorporate it. I'm lacking knowledge
>>> to cover that, and I failed to find anyone to do this. Just wanted to
>>> clarify.
>>>
>>> On Tue, Feb 11, 2025 at 7:56 AM Adam Hobbs <
>>> adam.ho...@bendigoadelaide.com.au> wrote:
>>>
>>>> Thanks for the reply.  I am not fussed how the situation is addressed
>>>> really, but I am just trying to keep the initiative alive.  This isn’t the
>>>> first time I have tried to rescue it.
>>>>
>>>> The feature would deliver great cost savings and possibly greater
>>>> performance for my use case.
>>>>
>>>> After the disappointment of seeing the github PR closed due to
>>>> inactivity I was unsure how to re-ignite things and it stuck me that maybe
>>>> ASF board report may be a way to highlight the issue.
>>>>
>>>> I understand that Structured streaming isn’t maybe the most common use
>>>> case for spark and that spark in of it self is more of a batch centric
>>>> technology, however I strongly believe that DRA in the long lived streaming
>>>> context is possibly even more important than DRA in batch context.  Running
>>>> a large Hadoop/spark cluster 24x7 is expensive and could really benefit
>>>> from the functionality that proper streaming work load based DRA could
>>>> bring.
>>>>
>>>> Also, knowing that the PR author has been running this DRA code in his
>>>> own environment for quite some time now successfully, makes it more
>>>> frustrating.  The code has essentially been tested externally before the PR
>>>> was even raised.  It seems to be more than just a theoretical improvement
>>>> to the codebase.
>>>>
>>>>
>>>>
>>>> Regards,
>>>>
>>>>
>>>>
>>>> Adam Hobbs
>>>>
>>>>
>>>>
>>>>
>>>> C2 - Internal Use
>>>> From: Jungtaek Lim <kabhwan.opensou...@gmail.com>
>>>> *Sent:* Tuesday, 11 February 2025 8:49 AM
>>>> *To:* adam.ho...@bendigoadelaide.com.au.invalid
>>>> *Cc:* Matei Zaharia <matei.zaha...@gmail.com>; Spark dev list <
>>>> dev@spark.apache.org>
>>>> *Subject:* Re: ASF board report draft for February 2025
>>>>
>>>>
>>>>
>>>> CAUTION: This email originated from outside of the organisation. Do
>>>> not click links or open attachments unless you recognise the sender's full
>>>> email address and know the content is safe.
>>>>
>>>>
>>>>
>>>> Thanks Adam for your email.
>>>>
>>>>
>>>>
>>>> I started to look at these changes when proposed but I am not familiar
>>>> with DRA. It needed a non-trivial context building for me to be effective
>>>> which I could not prioritize. I asked my team members to also review and
>>>> they were involved, but even they lacked context on how DRA works, its long
>>>> term supportability and maintainability.
>>>>
>>>>
>>>>
>>>> When possible I shepherd other initiatives (SPIP), such as Arbitrary
>>>> state processing API. If in the community there are folks who understand
>>>> DRA, its implications in terms of maintenance it will be nice for them to
>>>> share the load and shepherd the project.
>>>>
>>>>
>>>>
>>>> In any case, this seems to be a prioritization conversation that can
>>>> perhaps be taken in another thread and not block this ASF board report. Is
>>>> that ok for you?
>>>>
>>>>
>>>>
>>>> On Thu, Feb 6, 2025 at 2:30 PM Adam Hobbs <
>>>> adam.ho...@bendigoadelaide.com.au.invalid> wrote:
>>>>
>>>> I'd like to add something around the failure to get any traction on
>>>> shepparding of the structured streaming DRA PR.  Multiple times now there
>>>> have been calls for help to get this initiative over the line and the
>>>> response has been disappointing.  The github PR has been closed due to
>>>> inaction (https://github.com/apache/spark/pull/42352
>>>> <https://urldefense.com/v3/__https:/github.com/apache/spark/pull/42352__;!!OkoFT9xN!PCzDhELZksixXIrHSFOlAGsgyEuE_NVULgxNonSd-HZD1Zd33au7gPaYFH2JxcnQBEfr-Mp5F7YlJrk_iWBA9P4Y8Pbnc4iXNMYs$>
>>>> ).
>>>>
>>>> This seems like a bit of a failure in the process
>>>> .
>>>> Regards,
>>>>
>>>> Adam Hobbs
>>>>
>>>>
>>>> C2 - Internal Use
>>>> -----Original Message-----
>>>> From: Matei Zaharia <matei.zaha...@gmail.com>
>>>> Sent: Thursday, 6 February 2025 2:57 PM
>>>> To: Spark dev list <dev@spark.apache.org>
>>>> Cc: priv...@spark.apache.org
>>>> Subject: ASF board report draft for February 2025
>>>>
>>>> CAUTION: This email originated from outside of the organisation. Do not
>>>> click links or open attachments unless you recognise the sender's full
>>>> email address and know the content is safe.
>>>>
>>>>
>>>> It’s time to send our next ASF board report again on February 12th.
>>>> Here’s an initial draft — feel free to suggest changes:
>>>>
>>>> =====================
>>>>
>>>>
>>>> Description:
>>>>
>>>> Apache Spark is a fast and general purpose engine for large-scale data
>>>> processing. It offers high-level APIs in Java, Scala, Python, R and SQL as
>>>> well as a rich set of libraries including stream processing, machine
>>>> learning, and graph analytics.
>>>>
>>>> Issues for the board:
>>>>
>>>> - None
>>>>
>>>> Project status:
>>>>
>>>> - The Spark 4.0 branch has been cut and has entered the QA stage. We
>>>> encourage the community to test it out!
>>>> - We released Spark 3.5.4 on December 20th, 2024.
>>>> - The PMC voted to add one new committer (Bingkun Pan) and one new PMC
>>>> member (Jie Yang) to the project.
>>>> - The proposal to "Use plain text logs by default" was successfully
>>>> passed.
>>>>
>>>> Trademarks:
>>>>
>>>> - No changes since last report.
>>>>
>>>> Latest releases:
>>>>
>>>> - Spark 3.5.4 was released on Dec 20, 2024
>>>> - Spark 3.4.4 was released on Oct 27, 2024
>>>> - Spark 4.0 Preview 2 was released on Sept 26, 2024
>>>>
>>>> Committers and PMC:
>>>>
>>>> - The latest committer was added on Nov 13, 2024 (Bingkun Pan).
>>>> - The latest PMC member was added on Jan 21st, 2025 (Jie Yang).
>>>>
>>>> =====================
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>
>>>>
>>>> ********************************************************************************
>>>>
>>>> This communication is intended only for use of the addressee and may
>>>> contain legally privileged and confidential information.
>>>> If you are not the addressee or intended recipient, you are notified
>>>> that any dissemination, copying or use of any of the information is
>>>> unauthorised.
>>>>
>>>> The legal privilege and confidentiality attached to this e-mail is not
>>>> waived, lost or destroyed by reason of a mistaken delivery to you.
>>>> If you have received this message in error, we would appreciate an
>>>> immediate notification via e-mail to contac...@bendigoadelaide.com.au
>>>> or by phoning 1300 BENDIGO (1300 236 344), and ask that the e-mail be
>>>> permanently deleted from your system.
>>>>
>>>> Bendigo and Adelaide Bank Limited ABN 11 068 049 178
>>>>
>>>>
>>>> ********************************************************************************
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>
>>>>
>>>> ********************************************************************************
>>>>
>>>> This communication is intended only for use of the addressee and may
>>>> contain legally privileged and confidential information.
>>>> If you are not the addressee or intended recipient, you are notified
>>>> that any dissemination, copying or use of any of the information is
>>>> unauthorised.
>>>>
>>>> The legal privilege and confidentiality attached to this e-mail is not
>>>> waived, lost or destroyed by reason of a mistaken delivery to you.
>>>> If you have received this message in error, we would appreciate an
>>>> immediate notification via e-mail to contac...@bendigoadelaide.com.au
>>>> or by phoning 1300 BENDIGO (1300 236 344), and ask that the e-mail be
>>>> permanently deleted from your system.
>>>>
>>>> Bendigo and Adelaide Bank Limited ABN 11 068 049 178
>>>>
>>>>
>>>> ********************************************************************************
>>>>
>>>

Re: ASF board report draft for February 2025

Reply via email to