Re: ASF board report draft for February 2025

Mich Talebzadeh Wed, 12 Feb 2025 12:20:24 -0800

✅ *"Thanks, Matei. ✅ Looks like a plan!*

*📌 We resurrected the old thread! *


*https://lists.apache.org/thread/wwjyp1bhryvx7ytooj1lqtd8kgzxb6vq
<https://lists.apache.org/thread/wwjyp1bhryvx7ytooj1lqtd8kgzxb6vq>*

🔗 Hopefully, there will be more traction this round.

HTH

Dr Mich Talebzadeh,
Architect | Data Science | Financial Crime | Forensic Analysis | GDPR

   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>





On Wed, 12 Feb 2025 at 19:40, Matei Zaharia <[email protected]> wrote:

> I posted the report, but thanks for the feedback. Hopefully we can get
> enough coverage for DRA and the UI issues.
>
> On Feb 11, 2025, at 2:33 AM, Mich Talebzadeh <[email protected]>
> wrote:
>
> Let us carry on on that thread.
>
> Need to catch-up
>
> HTH
>
> Dr Mich Talebzadeh,
> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
>
>
> On Tue, 11 Feb 2025 at 06:01, Pavan Kotikalapudi <[email protected]>
> wrote:
>
>> Hi Adam,
>>
>> Thanks for bringing up this initiative again to spark committers. I can
>> resonate with that. It has been close to 2 years since this feature is
>> operational for us(internally) and has been waiting in the apache/spark
>> codebase for some love.
>>
>> It has soo many people (non-committers) interested in having this feature
>> delivered, someone I know also has already patched it up in their company
>> and has been also running huge workloads. I understand that is not the
>> ideal way of doing it, but If you are interested do let me know, I can help
>> with that (we internally have integrated it into 3 major spark releases!).
>>
>>  Jungtaek, I appreciate that you have already given some insights on this
>> feature and what other cases needs to be handled (I have covered that case
>> as well). I can help with building some context on DRA in core ( This is my
>> first contribution to spark, I could easily build context on it) We just
>> have to understand ExecutorAllocationManager.scala
>> <https://github.com/pkotikalapudi/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala>.
>> *As you suggested, If there are other committers of the project who
>> already understand the DRA and CORE (or willing to spend some time to
>> understand it), please help in shipping this feature.*
>> btw, DRA just uses ExecutorAllocationClient interface
>> <https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala>.
>> So it should help us cover all the resource managers ( I have personally
>> run it in standalone and k8s).
>>
>> as Mich and Jungtaek suggested let's move the discussion to this thread
>> <https://lists.apache.org/thread/wpvtvf4w3zygtkfgq4sthbf00y5pqxvr>, or
>> create a new one so that other committers can also pitch in.
>>
>> Thank you,
>>
>> Pavan
>>
>> On Tue, Feb 11, 2025 at 4:46 AM Mich Talebzadeh <
>> [email protected]> wrote:
>>
>>> Hi all,
>>>
>>> ok this DRA has already had a thread here
>>>
>>> Vote on Dynamic resource allocation for structured streaming
>>> [SPARK-24815]
>>>
>>> I recall I asked a committer to open the PR and it was opened and
>>> closed.because of inactivity. Pavan Kotikalapudi was working on it
>>>
>>> Happy to chip in and help where I can
>>>
>>> HTH
>>>
>>> Dr Mich Talebzadeh,
>>> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR
>>>
>>>    view my Linkedin profile
>>> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!YXSASsb33aQQkr6jr4rAJPRgufyWUaeZKJ6Y9whE7_vKZ7Ilt2CWmkZPTz8C7KcNXLTfJdwPWYZBgsFuicARlkuosfiKcQ$>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, 10 Feb 2025 at 23:05, Jungtaek Lim <[email protected]>
>>> wrote:
>>>
>>>> Let's move the discussion to the other thread, as it's not relevant to
>>>> the board report.
>>>>
>>>> tl;dr. Spark has a crazily large codebase and has multiple layers. SS
>>>> is on top of SQL and SQL is on top of CORE. DRA is bound to CORE,
>>>> especially used for specific resource managers like YARN (maybe we had
>>>> dealt with K8S, I don't know.) There are not many people who are experts on
>>>> multiple layers; I'm expert on SS and have an understanding of SQL and very
>>>> essential of CORE, but DRA is definitely an advanced topic in CORE.
>>>>
>>>> It's not because I don't want to incorporate it. I'm lacking knowledge
>>>> to cover that, and I failed to find anyone to do this. Just wanted to
>>>> clarify.
>>>>
>>>> On Tue, Feb 11, 2025 at 7:56 AM Adam Hobbs <
>>>> [email protected]> wrote:
>>>>
>>>>> Thanks for the reply.  I am not fussed how the situation is addressed
>>>>> really, but I am just trying to keep the initiative alive.  This isn’t the
>>>>> first time I have tried to rescue it.
>>>>>
>>>>> The feature would deliver great cost savings and possibly greater
>>>>> performance for my use case.
>>>>>
>>>>> After the disappointment of seeing the github PR closed due to
>>>>> inactivity I was unsure how to re-ignite things and it stuck me that maybe
>>>>> ASF board report may be a way to highlight the issue.
>>>>>
>>>>> I understand that Structured streaming isn’t maybe the most common use
>>>>> case for spark and that spark in of it self is more of a batch centric
>>>>> technology, however I strongly believe that DRA in the long lived 
>>>>> streaming
>>>>> context is possibly even more important than DRA in batch context.  
>>>>> Running
>>>>> a large Hadoop/spark cluster 24x7 is expensive and could really benefit
>>>>> from the functionality that proper streaming work load based DRA could
>>>>> bring.
>>>>>
>>>>> Also, knowing that the PR author has been running this DRA code in his
>>>>> own environment for quite some time now successfully, makes it more
>>>>> frustrating.  The code has essentially been tested externally before the 
>>>>> PR
>>>>> was even raised.  It seems to be more than just a theoretical improvement
>>>>> to the codebase.
>>>>>
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>>
>>>>>
>>>>> Adam Hobbs
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> C2 - Internal Use
>>>>> From: Jungtaek Lim <[email protected]>
>>>>> *Sent:* Tuesday, 11 February 2025 8:49 AM
>>>>> *To:* [email protected]
>>>>> *Cc:* Matei Zaharia <[email protected]>; Spark dev list <
>>>>> [email protected]>
>>>>> *Subject:* Re: ASF board report draft for February 2025
>>>>>
>>>>>
>>>>>
>>>>> CAUTION: This email originated from outside of the organisation. Do
>>>>> not click links or open attachments unless you recognise the sender's full
>>>>> email address and know the content is safe.
>>>>>
>>>>>
>>>>>
>>>>> Thanks Adam for your email.
>>>>>
>>>>>
>>>>>
>>>>> I started to look at these changes when proposed but I am not familiar
>>>>> with DRA. It needed a non-trivial context building for me to be effective
>>>>> which I could not prioritize. I asked my team members to also review and
>>>>> they were involved, but even they lacked context on how DRA works, its 
>>>>> long
>>>>> term supportability and maintainability.
>>>>>
>>>>>
>>>>>
>>>>> When possible I shepherd other initiatives (SPIP), such as Arbitrary
>>>>> state processing API. If in the community there are folks who understand
>>>>> DRA, its implications in terms of maintenance it will be nice for them to
>>>>> share the load and shepherd the project.
>>>>>
>>>>>
>>>>>
>>>>> In any case, this seems to be a prioritization conversation that can
>>>>> perhaps be taken in another thread and not block this ASF board report. Is
>>>>> that ok for you?
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Feb 6, 2025 at 2:30 PM Adam Hobbs <
>>>>> [email protected]> wrote:
>>>>>
>>>>> I'd like to add something around the failure to get any traction on
>>>>> shepparding of the structured streaming DRA PR.  Multiple times now there
>>>>> have been calls for help to get this initiative over the line and the
>>>>> response has been disappointing.  The github PR has been closed due to
>>>>> inaction (https://github.com/apache/spark/pull/42352
>>>>> <https://urldefense.com/v3/__https:/github.com/apache/spark/pull/42352__;!!OkoFT9xN!PCzDhELZksixXIrHSFOlAGsgyEuE_NVULgxNonSd-HZD1Zd33au7gPaYFH2JxcnQBEfr-Mp5F7YlJrk_iWBA9P4Y8Pbnc4iXNMYs$>
>>>>> ).
>>>>>
>>>>> This seems like a bit of a failure in the process
>>>>> .
>>>>> Regards,
>>>>>
>>>>> Adam Hobbs
>>>>>
>>>>>
>>>>> C2 - Internal Use
>>>>> -----Original Message-----
>>>>> From: Matei Zaharia <[email protected]>
>>>>> Sent: Thursday, 6 February 2025 2:57 PM
>>>>> To: Spark dev list <[email protected]>
>>>>> Cc: [email protected]
>>>>> Subject: ASF board report draft for February 2025
>>>>>
>>>>> CAUTION: This email originated from outside of the organisation. Do
>>>>> not click links or open attachments unless you recognise the sender's full
>>>>> email address and know the content is safe.
>>>>>
>>>>>
>>>>> It’s time to send our next ASF board report again on February 12th.
>>>>> Here’s an initial draft — feel free to suggest changes:
>>>>>
>>>>> =====================
>>>>>
>>>>>
>>>>> Description:
>>>>>
>>>>> Apache Spark is a fast and general purpose engine for large-scale data
>>>>> processing. It offers high-level APIs in Java, Scala, Python, R and SQL as
>>>>> well as a rich set of libraries including stream processing, machine
>>>>> learning, and graph analytics.
>>>>>
>>>>> Issues for the board:
>>>>>
>>>>> - None
>>>>>
>>>>> Project status:
>>>>>
>>>>> - The Spark 4.0 branch has been cut and has entered the QA stage. We
>>>>> encourage the community to test it out!
>>>>> - We released Spark 3.5.4 on December 20th, 2024.
>>>>> - The PMC voted to add one new committer (Bingkun Pan) and one new PMC
>>>>> member (Jie Yang) to the project.
>>>>> - The proposal to "Use plain text logs by default" was successfully
>>>>> passed.
>>>>>
>>>>> Trademarks:
>>>>>
>>>>> - No changes since last report.
>>>>>
>>>>> Latest releases:
>>>>>
>>>>> - Spark 3.5.4 was released on Dec 20, 2024
>>>>> - Spark 3.4.4 was released on Oct 27, 2024
>>>>> - Spark 4.0 Preview 2 was released on Sept 26, 2024
>>>>>
>>>>> Committers and PMC:
>>>>>
>>>>> - The latest committer was added on Nov 13, 2024 (Bingkun Pan).
>>>>> - The latest PMC member was added on Jan 21st, 2025 (Jie Yang).
>>>>>
>>>>> =====================
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe e-mail: [email protected]
>>>>>
>>>>>
>>>>> ********************************************************************************
>>>>>
>>>>> This communication is intended only for use of the addressee and may
>>>>> contain legally privileged and confidential information.
>>>>> If you are not the addressee or intended recipient, you are notified
>>>>> that any dissemination, copying or use of any of the information is
>>>>> unauthorised.
>>>>>
>>>>> The legal privilege and confidentiality attached to this e-mail is not
>>>>> waived, lost or destroyed by reason of a mistaken delivery to you.
>>>>> If you have received this message in error, we would appreciate an
>>>>> immediate notification via e-mail to [email protected]
>>>>> or by phoning 1300 BENDIGO (1300 236 344), and ask that the e-mail be
>>>>> permanently deleted from your system.
>>>>>
>>>>> Bendigo and Adelaide Bank Limited ABN 11 068 049 178
>>>>>
>>>>>
>>>>> ********************************************************************************
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe e-mail: [email protected]
>>>>>
>>>>>
>>>>> ********************************************************************************
>>>>>
>>>>> This communication is intended only for use of the addressee and may
>>>>> contain legally privileged and confidential information.
>>>>> If you are not the addressee or intended recipient, you are notified
>>>>> that any dissemination, copying or use of any of the information is
>>>>> unauthorised.
>>>>>
>>>>> The legal privilege and confidentiality attached to this e-mail is not
>>>>> waived, lost or destroyed by reason of a mistaken delivery to you.
>>>>> If you have received this message in error, we would appreciate an
>>>>> immediate notification via e-mail to [email protected]
>>>>> or by phoning 1300 BENDIGO (1300 236 344), and ask that the e-mail be
>>>>> permanently deleted from your system.
>>>>>
>>>>> Bendigo and Adelaide Bank Limited ABN 11 068 049 178
>>>>>
>>>>>
>>>>> ********************************************************************************
>>>>>
>>>>
>

Re: ASF board report draft for February 2025

Reply via email to