Spark Streaming from Kafka, deal with initial heavy load.

sagarcasual . Fri, 17 Mar 2017 21:53:57 -0700

Hi, we have spark 1.6.1 streaming from Kafka (0.10.1) topic using direct
approach. The streaming part works fine but when we initially start the
job, we have to deal with really huge Kafka message backlog, millions of
messages, and that first batch runs for over 40 hours,  and after 12 hours
or so it becomes very very slow, it keeps crunching messages, but at a very
low speed. Any idea how to overcome this issue? Once the job is all caught
up, subsequent batches are quick and fast since the load is really tiny to
process. So any idea how to avoid this problem?

Spark Streaming from Kafka, deal with initial heavy load.

Reply via email to