I'd like to add that Spark is not as fast as it should be, primarily due to its internal verbosity, as reported in ticket *SPARK-50992 <https://issues.apache.org/jira/browse/SPARK-50992>*. After submitting this PR <https://github.com/apache/spark/pull/49724>, I received some comments, which I quickly addressed, but the PR has since stalled.
I strongly believe that Spark should prioritize performance over internal logging, especially when it has such a significant impact on execution speed and can lead to memory issues. In *GraphFrames*, the temporary workaround was to disable *AQE (Adaptive Query Execution)*. Just last week, I gave the same advice to a colleague experiencing performance issues with a *Databricks* notebook—and it worked. Disabling *AQE* to improve performance because Spark continuously generates string descriptions of physical plans internally - that very likely noone is going to make use of them - makes little sense to me. PS: I wish I was wrong, but I really think I am not. PS2: The first part of a series of articles I'm wrting about this issue: link <https://medium.com/@angel.alvarez.pascua/apache-spark-wtf-i-like-it-when-a-plan-comes-together-part-i-48c52a667288> El jue, 6 feb 2025 a las 6:30, Adam Hobbs (<adam.ho...@bendigoadelaide.com.au.invalid>) escribió: > I'd like to add something around the failure to get any traction on > shepparding of the structured streaming DRA PR. Multiple times now there > have been calls for help to get this initiative over the line and the > response has been disappointing. The github PR has been closed due to > inaction (https://github.com/apache/spark/pull/42352). > > This seems like a bit of a failure in the process > . > Regards, > > Adam Hobbs > > > C2 - Internal Use > -----Original Message----- > From: Matei Zaharia <matei.zaha...@gmail.com> > Sent: Thursday, 6 February 2025 2:57 PM > To: Spark dev list <dev@spark.apache.org> > Cc: priv...@spark.apache.org > Subject: ASF board report draft for February 2025 > > CAUTION: This email originated from outside of the organisation. Do not > click links or open attachments unless you recognise the sender's full > email address and know the content is safe. > > > It’s time to send our next ASF board report again on February 12th. Here’s > an initial draft — feel free to suggest changes: > > ===================== > > > Description: > > Apache Spark is a fast and general purpose engine for large-scale data > processing. It offers high-level APIs in Java, Scala, Python, R and SQL as > well as a rich set of libraries including stream processing, machine > learning, and graph analytics. > > Issues for the board: > > - None > > Project status: > > - The Spark 4.0 branch has been cut and has entered the QA stage. We > encourage the community to test it out! > - We released Spark 3.5.4 on December 20th, 2024. > - The PMC voted to add one new committer (Bingkun Pan) and one new PMC > member (Jie Yang) to the project. > - The proposal to "Use plain text logs by default" was successfully passed. > > Trademarks: > > - No changes since last report. > > Latest releases: > > - Spark 3.5.4 was released on Dec 20, 2024 > - Spark 3.4.4 was released on Oct 27, 2024 > - Spark 4.0 Preview 2 was released on Sept 26, 2024 > > Committers and PMC: > > - The latest committer was added on Nov 13, 2024 (Bingkun Pan). > - The latest PMC member was added on Jan 21st, 2025 (Jie Yang). > > ===================== > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > > ******************************************************************************** > > This communication is intended only for use of the addressee and may > contain legally privileged and confidential information. > If you are not the addressee or intended recipient, you are notified that > any dissemination, copying or use of any of the information is unauthorised. > > The legal privilege and confidentiality attached to this e-mail is not > waived, lost or destroyed by reason of a mistaken delivery to you. > If you have received this message in error, we would appreciate an > immediate notification via e-mail to contac...@bendigoadelaide.com.au or > by phoning 1300 BENDIGO (1300 236 344), and ask that the e-mail be > permanently deleted from your system. > > Bendigo and Adelaide Bank Limited ABN 11 068 049 178 > > > ******************************************************************************** > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >