Assume the batch interval is 10 seconds and batch processing time is 30 
seconds. So while Spark Streaming is processing the first batch, the receiver 
will have a backlog of 20 seconds worth of data. By the time Spark Streaming 
finishes batch #2, the receiver will have 40 seconds worth of data in memory 
buffer. This backlog will keep growing as time passes assuming data streams in 
consistently at the same rate.

Also keep in mind that windowing operations on a DStream implicitly persist 
every RDD in a DStream in memory.

Mohammed

-----Original Message-----
From: Jacek Laskowski [mailto:ja...@japila.pl] 
Sent: Thursday, August 4, 2016 4:25 PM
To: Mohammed Guller
Cc: Saurav Sinha; user
Subject: Re: Explanation regarding Spark Streaming

On Fri, Aug 5, 2016 at 12:48 AM, Mohammed Guller <moham...@glassbeam.com> wrote:
> and eventually you will run out of memory.

Why? Mind elaborating?

Jacek

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to