Hi all,

I'm running Spark Streaming with Kafka Direct Stream, but after
running a couple of days, the batch processing time almost doubles.
I didn't find any slowdown on JVM GC logs, but I did find that Spark
broadcast variable reading time increasing.
Initially it takes less than 10ms, but after 3 days it takes more than
60ms. It's really puzzling since I don't use broadcast variables at
all.

My application needs to run 24/7, so I hope there's something I'm
missing to correct this behavior.

FYI, we're running on AWS EMR with Spark version 1.6.1, in YARN client mode.
Attached spark application environment settings file.

--
John Simon


environment.txt (7K) 
<http://apache-spark-user-list.1001560.n3.nabble.com/attachment/27138/0/environment.txt>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Long-Running-Spark-Streaming-getting-slower-tp27138.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to