Btw, while analyzing this issue, I've also noticed that exactly the same
plan got stringified several times. Not only that, but even within a plan,
the same nodes got stringified dozens and dozens of times. I haven't
reported it because I added the memoization pattern to fix both things and,
despite fixing it ... the root issue with performance and OOM still
persisted.

PS: Some nodes got stringified thousands of times. I was ... totally in
shock nobody had noticed it before.

El jue, 6 feb 2025 a las 8:55, Ángel (<angel.alvarez.pas...@gmail.com>)
escribió:

> If I'm not wrong, the events were still been generated and stored and
> contained the plans (but without the description). Maybe we could just
> simply... generate the strings "on demand" in a lazy fashion, when the user
> requests it on Spark UI.
>
> I don't know if that's even possible, just thought about it while walking
> my dog ...🐶
>
> El jue, 6 feb 2025, 8:41, Wenchen Fan <cloud0...@gmail.com> escribió:
>
>> Hi Angel,
>>
>> AFAIK many people rely on the Spark UI to debug/inspect their queries
>> with the query pan tree and metrics, but you are right that plan string
>> generation is expensive, and we shouldn't do it for every AQE plan change.
>> Maybe we should do it only once to report the final plan for AQE? Let's
>> continue the discussion on the PR.
>>
>> On Thu, Feb 6, 2025 at 1:48 PM Ángel <angel.alvarez.pas...@gmail.com>
>> wrote:
>>
>>> I'd like to add that Spark is not as fast as it should be, primarily due
>>> to its internal verbosity, as reported in ticket *SPARK-50992
>>> <https://issues.apache.org/jira/browse/SPARK-50992>*. After submitting
>>> this  PR <https://github.com/apache/spark/pull/49724>, I received some
>>> comments, which I quickly addressed, but the PR has since stalled.
>>>
>>> I strongly believe that Spark should prioritize performance over
>>> internal logging, especially when it has such a significant impact on
>>> execution speed and can lead to memory issues.
>>>
>>> In *GraphFrames*, the temporary workaround was to disable *AQE
>>> (Adaptive Query Execution)*. Just last week, I gave the same advice to
>>> a colleague experiencing performance issues with a *Databricks*
>>> notebook—and it worked. Disabling *AQE* to improve performance because
>>> Spark continuously generates string descriptions of physical plans
>>> internally -  that very likely noone is going to make use of them - makes
>>> little sense to me.
>>> PS: I wish I was wrong, but I really think I am not.
>>> PS2: The first part of a series of articles I'm wrting about this issue:
>>> link
>>> <https://medium.com/@angel.alvarez.pascua/apache-spark-wtf-i-like-it-when-a-plan-comes-together-part-i-48c52a667288>
>>>
>>> El jue, 6 feb 2025 a las 6:30, Adam Hobbs
>>> (<adam.ho...@bendigoadelaide.com.au.invalid>) escribió:
>>>
>>>> I'd like to add something around the failure to get any traction on
>>>> shepparding of the structured streaming DRA PR.  Multiple times now there
>>>> have been calls for help to get this initiative over the line and the
>>>> response has been disappointing.  The github PR has been closed due to
>>>> inaction (https://github.com/apache/spark/pull/42352).
>>>>
>>>> This seems like a bit of a failure in the process
>>>> .
>>>> Regards,
>>>>
>>>> Adam Hobbs
>>>>
>>>>
>>>> C2 - Internal Use
>>>> -----Original Message-----
>>>> From: Matei Zaharia <matei.zaha...@gmail.com>
>>>> Sent: Thursday, 6 February 2025 2:57 PM
>>>> To: Spark dev list <dev@spark.apache.org>
>>>> Cc: priv...@spark.apache.org
>>>> Subject: ASF board report draft for February 2025
>>>>
>>>> CAUTION: This email originated from outside of the organisation. Do not
>>>> click links or open attachments unless you recognise the sender's full
>>>> email address and know the content is safe.
>>>>
>>>>
>>>> It’s time to send our next ASF board report again on February 12th.
>>>> Here’s an initial draft — feel free to suggest changes:
>>>>
>>>> =====================
>>>>
>>>>
>>>> Description:
>>>>
>>>> Apache Spark is a fast and general purpose engine for large-scale data
>>>> processing. It offers high-level APIs in Java, Scala, Python, R and SQL as
>>>> well as a rich set of libraries including stream processing, machine
>>>> learning, and graph analytics.
>>>>
>>>> Issues for the board:
>>>>
>>>> - None
>>>>
>>>> Project status:
>>>>
>>>> - The Spark 4.0 branch has been cut and has entered the QA stage. We
>>>> encourage the community to test it out!
>>>> - We released Spark 3.5.4 on December 20th, 2024.
>>>> - The PMC voted to add one new committer (Bingkun Pan) and one new PMC
>>>> member (Jie Yang) to the project.
>>>> - The proposal to "Use plain text logs by default" was successfully
>>>> passed.
>>>>
>>>> Trademarks:
>>>>
>>>> - No changes since last report.
>>>>
>>>> Latest releases:
>>>>
>>>> - Spark 3.5.4 was released on Dec 20, 2024
>>>> - Spark 3.4.4 was released on Oct 27, 2024
>>>> - Spark 4.0 Preview 2 was released on Sept 26, 2024
>>>>
>>>> Committers and PMC:
>>>>
>>>> - The latest committer was added on Nov 13, 2024 (Bingkun Pan).
>>>> - The latest PMC member was added on Jan 21st, 2025 (Jie Yang).
>>>>
>>>> =====================
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>
>>>>
>>>> ********************************************************************************
>>>>
>>>> This communication is intended only for use of the addressee and may
>>>> contain legally privileged and confidential information.
>>>> If you are not the addressee or intended recipient, you are notified
>>>> that any dissemination, copying or use of any of the information is
>>>> unauthorised.
>>>>
>>>> The legal privilege and confidentiality attached to this e-mail is not
>>>> waived, lost or destroyed by reason of a mistaken delivery to you.
>>>> If you have received this message in error, we would appreciate an
>>>> immediate notification via e-mail to contac...@bendigoadelaide.com.au
>>>> or by phoning 1300 BENDIGO (1300 236 344), and ask that the e-mail be
>>>> permanently deleted from your system.
>>>>
>>>> Bendigo and Adelaide Bank Limited ABN 11 068 049 178
>>>>
>>>>
>>>> ********************************************************************************
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>
>>>>

Reply via email to