[jira] [Updated] (SPARK-19068) Large number of executors causing a ton of ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(41,WrappedArray())

JESSE CHEN (JIRA) Tue, 03 Jan 2017 15:48:15 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-19068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


JESSE CHEN updated SPARK-19068:
-------------------------------
    Attachment: sparklog.tar.gz

This is the Spark console output in which you can find settings and sequence of 
events. At end you will see the "never-ending" event dropping messages. 

> Large number of executors causing a ton of ERROR scheduler.LiveListenerBus: 
> SparkListenerBus has already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(41,WrappedArray())
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-19068
>                 URL: https://issues.apache.org/jira/browse/SPARK-19068
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 2.1.0
>         Environment: RHEL 7.2
>            Reporter: JESSE CHEN
>         Attachments: sparklog.tar.gz
>
>
> On a large cluster with 45TB RAM and 1,000 cores, we used 1008 executors in 
> order to use all RAM and cores for a 100TB Spark SQL workload. Long-running 
> queries tend to report the following ERRORs
> {noformat}
> 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(136,WrappedArray())
> 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(853,WrappedArray())
> 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(395,WrappedArray())
> 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(736,WrappedArray())
> 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(439,WrappedArray())
> 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(16,WrappedArray())
> 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(307,WrappedArray())
> 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(51,WrappedArray())
> 16/12/27 12:44:29 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(535,WrappedArray())
> 16/12/27 12:44:29 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(63,WrappedArray())
> 16/12/27 12:44:29 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(333,WrappedArray())
> 16/12/27 12:44:29 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(484,WrappedArray())
> ....(omitted) 
> {noformat}
> The message itself maybe a reasonable response to a already stopped 
> SparkListenerBus (so subsequent events are thrown away with that ERROR 
> message). The issue is that because SparkContext does NOT exit until all 
> these ERROR/events are reported, which is a huge number in our setup -- and 
> this can take, in some cases, hours!!!
> We tried increasing the 
> Adding default property: spark.scheduler.listenerbus.eventqueue.size=130000
> from 10K, this still occurs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-19068) Large number of executors causing a ton of ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(41,WrappedArray())

Reply via email to