Re: ASF board report draft for February 2025

Jungtaek Lim Mon, 10 Feb 2025 15:05:17 -0800

Let's move the discussion to the other thread, as it's not relevant to the
board report.


tl;dr. Spark has a crazily large codebase and has multiple layers. SS is on
top of SQL and SQL is on top of CORE. DRA is bound to CORE, especially used
for specific resource managers like YARN (maybe we had dealt with K8S, I
don't know.) There are not many people who are experts on multiple layers;
I'm expert on SS and have an understanding of SQL and very essential of
CORE, but DRA is definitely an advanced topic in CORE.

It's not because I don't want to incorporate it. I'm lacking knowledge to
cover that, and I failed to find anyone to do this. Just wanted to clarify.

On Tue, Feb 11, 2025 at 7:56 AM Adam Hobbs <
[email protected]> wrote:

> Thanks for the reply.  I am not fussed how the situation is addressed
> really, but I am just trying to keep the initiative alive.  This isn’t the
> first time I have tried to rescue it.
>
> The feature would deliver great cost savings and possibly greater
> performance for my use case.
>
> After the disappointment of seeing the github PR closed due to inactivity
> I was unsure how to re-ignite things and it stuck me that maybe ASF board
> report may be a way to highlight the issue.
>
> I understand that Structured streaming isn’t maybe the most common use
> case for spark and that spark in of it self is more of a batch centric
> technology, however I strongly believe that DRA in the long lived streaming
> context is possibly even more important than DRA in batch context.  Running
> a large Hadoop/spark cluster 24x7 is expensive and could really benefit
> from the functionality that proper streaming work load based DRA could
> bring.
>
> Also, knowing that the PR author has been running this DRA code in his own
> environment for quite some time now successfully, makes it more
> frustrating.  The code has essentially been tested externally before the PR
> was even raised.  It seems to be more than just a theoretical improvement
> to the codebase.
>
>
>
> Regards,
>
>
>
> Adam Hobbs
>
>
>
>
> C2 - Internal Use
> From: Jungtaek Lim <[email protected]>
> *Sent:* Tuesday, 11 February 2025 8:49 AM
> *To:* [email protected]
> *Cc:* Matei Zaharia <[email protected]>; Spark dev list <
> [email protected]>
> *Subject:* Re: ASF board report draft for February 2025
>
>
>
> CAUTION: This email originated from outside of the organisation. Do not
> click links or open attachments unless you recognise the sender's full
> email address and know the content is safe.
>
>
>
> Thanks Adam for your email.
>
>
>
> I started to look at these changes when proposed but I am not familiar
> with DRA. It needed a non-trivial context building for me to be effective
> which I could not prioritize. I asked my team members to also review and
> they were involved, but even they lacked context on how DRA works, its long
> term supportability and maintainability.
>
>
>
> When possible I shepherd other initiatives (SPIP), such as Arbitrary state
> processing API. If in the community there are folks who understand DRA, its
> implications in terms of maintenance it will be nice for them to share the
> load and shepherd the project.
>
>
>
> In any case, this seems to be a prioritization conversation that can
> perhaps be taken in another thread and not block this ASF board report. Is
> that ok for you?
>
>
>
> On Thu, Feb 6, 2025 at 2:30 PM Adam Hobbs <
> [email protected]> wrote:
>
> I'd like to add something around the failure to get any traction on
> shepparding of the structured streaming DRA PR.  Multiple times now there
> have been calls for help to get this initiative over the line and the
> response has been disappointing.  The github PR has been closed due to
> inaction (https://github.com/apache/spark/pull/42352
> <https://urldefense.com/v3/__https:/github.com/apache/spark/pull/42352__;!!OkoFT9xN!PCzDhELZksixXIrHSFOlAGsgyEuE_NVULgxNonSd-HZD1Zd33au7gPaYFH2JxcnQBEfr-Mp5F7YlJrk_iWBA9P4Y8Pbnc4iXNMYs$>
> ).
>
> This seems like a bit of a failure in the process
> .
> Regards,
>
> Adam Hobbs
>
>
> C2 - Internal Use
> -----Original Message-----
> From: Matei Zaharia <[email protected]>
> Sent: Thursday, 6 February 2025 2:57 PM
> To: Spark dev list <[email protected]>
> Cc: [email protected]
> Subject: ASF board report draft for February 2025
>
> CAUTION: This email originated from outside of the organisation. Do not
> click links or open attachments unless you recognise the sender's full
> email address and know the content is safe.
>
>
> It’s time to send our next ASF board report again on February 12th. Here’s
> an initial draft — feel free to suggest changes:
>
> =====================
>
>
> Description:
>
> Apache Spark is a fast and general purpose engine for large-scale data
> processing. It offers high-level APIs in Java, Scala, Python, R and SQL as
> well as a rich set of libraries including stream processing, machine
> learning, and graph analytics.
>
> Issues for the board:
>
> - None
>
> Project status:
>
> - The Spark 4.0 branch has been cut and has entered the QA stage. We
> encourage the community to test it out!
> - We released Spark 3.5.4 on December 20th, 2024.
> - The PMC voted to add one new committer (Bingkun Pan) and one new PMC
> member (Jie Yang) to the project.
> - The proposal to "Use plain text logs by default" was successfully passed.
>
> Trademarks:
>
> - No changes since last report.
>
> Latest releases:
>
> - Spark 3.5.4 was released on Dec 20, 2024
> - Spark 3.4.4 was released on Oct 27, 2024
> - Spark 4.0 Preview 2 was released on Sept 26, 2024
>
> Committers and PMC:
>
> - The latest committer was added on Nov 13, 2024 (Bingkun Pan).
> - The latest PMC member was added on Jan 21st, 2025 (Jie Yang).
>
> =====================
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [email protected]
>
>
> ********************************************************************************
>
> This communication is intended only for use of the addressee and may
> contain legally privileged and confidential information.
> If you are not the addressee or intended recipient, you are notified that
> any dissemination, copying or use of any of the information is unauthorised.
>
> The legal privilege and confidentiality attached to this e-mail is not
> waived, lost or destroyed by reason of a mistaken delivery to you.
> If you have received this message in error, we would appreciate an
> immediate notification via e-mail to [email protected] or
> by phoning 1300 BENDIGO (1300 236 344), and ask that the e-mail be
> permanently deleted from your system.
>
> Bendigo and Adelaide Bank Limited ABN 11 068 049 178
>
>
> ********************************************************************************
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [email protected]
>
>
> ********************************************************************************
>
> This communication is intended only for use of the addressee and may
> contain legally privileged and confidential information.
> If you are not the addressee or intended recipient, you are notified that
> any dissemination, copying or use of any of the information is unauthorised.
>
> The legal privilege and confidentiality attached to this e-mail is not
> waived, lost or destroyed by reason of a mistaken delivery to you.
> If you have received this message in error, we would appreciate an
> immediate notification via e-mail to [email protected] or
> by phoning 1300 BENDIGO (1300 236 344), and ask that the e-mail be
> permanently deleted from your system.
>
> Bendigo and Adelaide Bank Limited ABN 11 068 049 178
>
>
> ********************************************************************************
>

Re: ASF board report draft for February 2025

Reply via email to