Mathieu D created SPARK-18881:
---------------------------------
Summary: Spark never finishes jobs and stages, JobProgressListener
fails
Key: SPARK-18881
URL: https://issues.apache.org/jira/browse/SPARK-18881
Project: Spark
Issue Type: Bug
Affects Versions: 2.0.2
Environment: yarn, deploy-mode = client
Reporter: Mathieu D
We have a Spark application that process continuously a lot of incoming jobs.
Several jobs are processed in parallel, on multiple threads.
During intensive workloads, at some point, we start to have hundreds of
warnings like this :
{code}
16/12/14 21:04:03 WARN JobProgressListener: Task end for unknown stage 147379
16/12/14 21:04:03 WARN JobProgressListener: Job completed for unknown job 64610
16/12/14 21:04:04 WARN JobProgressListener: Task start for unknown stage 147405
16/12/14 21:04:04 WARN JobProgressListener: Task end for unknown stage 147406
16/12/14 21:04:04 WARN JobProgressListener: Job completed for unknown job 64622
{code}
Starting from that, the performance of the app plummet, most of Stages and Jobs
never finish. On SparkUI, I can see figures like 13000 pending jobs.
I can't see clearly another related exception happening before. Maybe this one,
but it concerns another listener :
{code}
16/12/14 21:03:54 ERROR LiveListenerBus: Dropping SparkListenerEvent because no
remaining room in event queue. This likely means one of the SparkListeners is
too slow and cannot keep up with the rate at which tasks are being started by
the scheduler.
16/12/14 21:03:54 WARN LiveListenerBus: Dropped 1 SparkListenerEvents since Thu
Jan 01 01:00:00 CET 1970
{code}
This is very problematic for us, since it's hard to detect, and requires an app
restart.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]