I'd like to add that Spark is not as fast as it should be, primarily due to
its internal verbosity, as reported in ticket *SPARK-50992
<https://issues.apache.org/jira/browse/SPARK-50992>*. After submitting
this  PR <https://github.com/apache/spark/pull/49724>, I received some
comments, which I quickly addressed, but the PR has since stalled.

I strongly believe that Spark should prioritize performance over internal
logging, especially when it has such a significant impact on execution
speed and can lead to memory issues.

In *GraphFrames*, the temporary workaround was to disable *AQE (Adaptive
Query Execution)*. Just last week, I gave the same advice to a colleague
experiencing performance issues with a *Databricks* notebook—and it worked.
Disabling *AQE* to improve performance because Spark continuously generates
string descriptions of physical plans internally -  that very likely noone
is going to make use of them - makes little sense to me.
PS: I wish I was wrong, but I really think I am not.
PS2: The first part of a series of articles I'm wrting about this issue:
link
<https://medium.com/@angel.alvarez.pascua/apache-spark-wtf-i-like-it-when-a-plan-comes-together-part-i-48c52a667288>

El jue, 6 feb 2025 a las 6:30, Adam Hobbs
(<adam.ho...@bendigoadelaide.com.au.invalid>) escribió:

> I'd like to add something around the failure to get any traction on
> shepparding of the structured streaming DRA PR.  Multiple times now there
> have been calls for help to get this initiative over the line and the
> response has been disappointing.  The github PR has been closed due to
> inaction (https://github.com/apache/spark/pull/42352).
>
> This seems like a bit of a failure in the process
> .
> Regards,
>
> Adam Hobbs
>
>
> C2 - Internal Use
> -----Original Message-----
> From: Matei Zaharia <matei.zaha...@gmail.com>
> Sent: Thursday, 6 February 2025 2:57 PM
> To: Spark dev list <dev@spark.apache.org>
> Cc: priv...@spark.apache.org
> Subject: ASF board report draft for February 2025
>
> CAUTION: This email originated from outside of the organisation. Do not
> click links or open attachments unless you recognise the sender's full
> email address and know the content is safe.
>
>
> It’s time to send our next ASF board report again on February 12th. Here’s
> an initial draft — feel free to suggest changes:
>
> =====================
>
>
> Description:
>
> Apache Spark is a fast and general purpose engine for large-scale data
> processing. It offers high-level APIs in Java, Scala, Python, R and SQL as
> well as a rich set of libraries including stream processing, machine
> learning, and graph analytics.
>
> Issues for the board:
>
> - None
>
> Project status:
>
> - The Spark 4.0 branch has been cut and has entered the QA stage. We
> encourage the community to test it out!
> - We released Spark 3.5.4 on December 20th, 2024.
> - The PMC voted to add one new committer (Bingkun Pan) and one new PMC
> member (Jie Yang) to the project.
> - The proposal to "Use plain text logs by default" was successfully passed.
>
> Trademarks:
>
> - No changes since last report.
>
> Latest releases:
>
> - Spark 3.5.4 was released on Dec 20, 2024
> - Spark 3.4.4 was released on Oct 27, 2024
> - Spark 4.0 Preview 2 was released on Sept 26, 2024
>
> Committers and PMC:
>
> - The latest committer was added on Nov 13, 2024 (Bingkun Pan).
> - The latest PMC member was added on Jan 21st, 2025 (Jie Yang).
>
> =====================
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>
> ********************************************************************************
>
> This communication is intended only for use of the addressee and may
> contain legally privileged and confidential information.
> If you are not the addressee or intended recipient, you are notified that
> any dissemination, copying or use of any of the information is unauthorised.
>
> The legal privilege and confidentiality attached to this e-mail is not
> waived, lost or destroyed by reason of a mistaken delivery to you.
> If you have received this message in error, we would appreciate an
> immediate notification via e-mail to contac...@bendigoadelaide.com.au or
> by phoning 1300 BENDIGO (1300 236 344), and ask that the e-mail be
> permanently deleted from your system.
>
> Bendigo and Adelaide Bank Limited ABN 11 068 049 178
>
>
> ********************************************************************************
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Reply via email to