[
https://issues.apache.org/jira/browse/SPARK-29562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun reassigned SPARK-29562:
-------------------------------------
Assignee: Marcelo Masiero Vanzin
> SQLAppStatusListener metrics aggregation is slow and memory hungry
> ------------------------------------------------------------------
>
> Key: SPARK-29562
> URL: https://issues.apache.org/jira/browse/SPARK-29562
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.4.4
> Reporter: Marcelo Masiero Vanzin
> Assignee: Marcelo Masiero Vanzin
> Priority: Major
>
> While {{SQLAppStatusListener}} was added in 2.3, the aggregation code is very
> similar to what it was previously, so I'm sure this is even older.
> Long story short, the aggregation code
> ({{SQLAppStatusListener.aggregateMetrics}}) is very, very slow, and can take
> a non-trivial amount of time with large queries, aside from using a ton of
> memory.
> There are also cascading issues caused by that: since it's called from an
> event handler, it can slow down event processing, causing events to be
> dropped, which can cause listeners to miss important events that would tell
> them to free up internal state (and, thus, memory).
> To given an anecdotal example, one app I looked at ran into the "events being
> dropped" issue, which caused the listener to accumulate state for 100s of
> live stages, even though most of them were already finished. That lead to a
> few GB of memory being wasted due to finished stages that were still being
> tracked.
> Here, though, I'd like to focus on {{SQLAppStatusListener.aggregateMetrics}}
> and making it faster. We should look at the other issues (unblocking event
> processing, cleaning up of stale data in listeners) separately.
> (I also remember someone in the past trying to fix something in this area,
> but couldn't find a PR nor an open bug.)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]