[ 
https://issues.apache.org/jira/browse/SPARK-18838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16016579#comment-16016579
 ] 

Antoine PRANG commented on SPARK-18838:
---------------------------------------

[~joshrosen][~sitalke...@gmail.com] I have measured the listener execution time 
on my test case.
You can ind below for each listener the average percentage of the total 
execution time for a message (I have instrumented ListenerBus)
EnvironmentListener       :0.0
EventLoggingListener      :48.2
ExecutorsListener         :0.1
HeartbeatReceiver         :0.0
JobProgressListener       :6.9
PepperdataSparkListener   :2.7
RDDOperationGraphListener :0.1  
StorageListener           :38.3
StorageStatusListener     :0.4
The execution time is concentrated on 2 Listeners: EventLoggingListener, 
StorageListener.
I think that putting parallelization at the listener bus is not a so good idea. 
Duplicating the messages in 2 queues will change the current synchronization 
contract (all listeners receive each message in the same time, they are ahead 
or behind other listener from 1 and only 1 message).
For me the best idea would be to keep the listener bus as simple as now (N 
producers - 1 consumer) to be able to take advantage of that to dequeue as fast 
as possible and introduce parallelisation at the listener level - being aware 
of the synchronization contract - when it is possible. The  
EventLoggingListener can for example be executed asynchronously.
I am doing a commit to do that right now.

   

> High latency of event processing for large jobs
> -----------------------------------------------
>
>                 Key: SPARK-18838
>                 URL: https://issues.apache.org/jira/browse/SPARK-18838
>             Project: Spark
>          Issue Type: Improvement
>    Affects Versions: 2.0.0
>            Reporter: Sital Kedia
>
> Currently we are observing the issue of very high event processing delay in 
> driver's `ListenerBus` for large jobs with many tasks. Many critical 
> component of the scheduler like `ExecutorAllocationManager`, 
> `HeartbeatReceiver` depend on the `ListenerBus` events and this delay might 
> hurt the job performance significantly or even fail the job.  For example, a 
> significant delay in receiving the `SparkListenerTaskStart` might cause 
> `ExecutorAllocationManager` manager to mistakenly remove an executor which is 
> not idle.  
> The problem is that the event processor in `ListenerBus` is a single thread 
> which loops through all the Listeners for each event and processes each event 
> synchronously 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala#L94.
>  This single threaded processor often becomes the bottleneck for large jobs.  
> Also, if one of the Listener is very slow, all the listeners will pay the 
> price of delay incurred by the slow listener. In addition to that a slow 
> listener can cause events to be dropped from the event queue which might be 
> fatal to the job.
> To solve the above problems, we propose to get rid of the event queue and the 
> single threaded event processor. Instead each listener will have its own 
> dedicate single threaded executor service . When ever an event is posted, it 
> will be submitted to executor service of all the listeners. The Single 
> threaded executor service will guarantee in order processing of the events 
> per listener.  The queue used for the executor service will be bounded to 
> guarantee we do not grow the memory indefinitely. The downside of this 
> approach is separate event queue per listener will increase the driver memory 
> footprint. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to