Hi, we have spark 1.6.1 streaming from Kafka (0.10.1) topic using direct
approach. The streaming part works fine but when we initially start the
job, we have to deal with really huge Kafka message backlog, millions of
messages, and that first batch runs for over 40 hours,  and after 12 hours
or so it becomes very very slow, it keeps crunching messages, but at a very
low speed. Any idea how to overcome this issue? Once the job is all caught
up, subsequent batches are quick and fast since the load is really tiny to
process. So any idea how to avoid this problem?

Reply via email to