[
https://issues.apache.org/jira/browse/SPARK-18838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16016217#comment-16016217
]
Sital Kedia commented on SPARK-18838:
-------------------------------------
[~joshrosen] - >> Alternatively, we could use two queues, one for internal
listeners and another for external ones. This wouldn't be as fine-grained as
thread-per-listener but might buy us a lot of the benefits with perhaps less
code needed.
Actually that is exactly what my PR is doing.
https://github.com/apache/spark/pull/16291. I have not been able to work on it
recently, but you can take a look and let me know how it looks. I can
prioritize working on it.
> High latency of event processing for large jobs
> -----------------------------------------------
>
> Key: SPARK-18838
> URL: https://issues.apache.org/jira/browse/SPARK-18838
> Project: Spark
> Issue Type: Improvement
> Affects Versions: 2.0.0
> Reporter: Sital Kedia
>
> Currently we are observing the issue of very high event processing delay in
> driver's `ListenerBus` for large jobs with many tasks. Many critical
> component of the scheduler like `ExecutorAllocationManager`,
> `HeartbeatReceiver` depend on the `ListenerBus` events and this delay might
> hurt the job performance significantly or even fail the job. For example, a
> significant delay in receiving the `SparkListenerTaskStart` might cause
> `ExecutorAllocationManager` manager to mistakenly remove an executor which is
> not idle.
> The problem is that the event processor in `ListenerBus` is a single thread
> which loops through all the Listeners for each event and processes each event
> synchronously
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala#L94.
> This single threaded processor often becomes the bottleneck for large jobs.
> Also, if one of the Listener is very slow, all the listeners will pay the
> price of delay incurred by the slow listener. In addition to that a slow
> listener can cause events to be dropped from the event queue which might be
> fatal to the job.
> To solve the above problems, we propose to get rid of the event queue and the
> single threaded event processor. Instead each listener will have its own
> dedicate single threaded executor service . When ever an event is posted, it
> will be submitted to executor service of all the listeners. The Single
> threaded executor service will guarantee in order processing of the events
> per listener. The queue used for the executor service will be bounded to
> guarantee we do not grow the memory indefinitely. The downside of this
> approach is separate event queue per listener will increase the driver memory
> footprint.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]