Currently Spark Streaming does not support addition/deletion/modification
of DStream after the streaming context has been started.
Nor can you restart a stopped streaming context.
Also, multiple spark contexts (and therefore multiple streaming contexts)
cannot be run concurrently in the same JVM.

To change the window duration, I would one of the following.

1. Stop the previous streaming context, create a new streaming context, and
setup the dstreams once again with the new window duration
2. Create a custom DStream, say DynamicWindowDStream. Take a look at how
WindowedDStream
<https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/WindowedDStream.scala>
is implemented (pretty simple, just a union over RDDs across time). That
should allow you to modify the window duration. However, do make sure you
have a maximum window duration that you will ever reach, and make sure you
define parentRememberDuration
<https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/WindowedDStream.scala#L53>
as
a "rememberDuration + maxWindowDuration". That fields defines which RDDs
can be forgotten, so is sensitive to the window duration. Then you have to
take care of correctly (atomically, etc.) modifying the window duration as
per your requirements.

Happy streaming!

TD




On Mon, Jun 2, 2014 at 2:46 PM, lbustelo <g...@bustelos.com> wrote:

> This is a general question about whether Spark Streaming can be interactive
> like batch Spark jobs. I've read plenty of threads and done my fair bit of
> experimentation and I'm thinking the answer is NO, but it does not hurt to
> ask.
>
> More specifically, I would like to be able to do:
> 1. Add/Remove steps to the Streaming Job
> 2. Modify Window durations
> 3. Stop and Restart context.
>
> I've tried the following:
>
> 1. Modify the DStream after it has been started… BOOM! Exceptions
> everywhere.
>
> 2. Stop the DStream, Make modification, Start… NOT GOOD :( In 0.9.0 I was
> getting deadlocks. I also tried 1.0.0 and it did not work.
>
> 3. Based on information provided  here
> <
> http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-and-the-spark-shell-tp3347p3371.html
> >
> , I was been able to prototype modifying the RDD computation within a
> forEachRDD. That is nice, but you are then bounded to the specified batch
> size. That got me to wanting to modify Window durations. Is changing the
> Window duration possible?
>
> 4. Tried running multiple streaming context from within a single Driver
> application and got several exceptions. The first one was bind exception on
> the web port. Then once the app started getting run (cores were taken but
> 1st job) it did not run correctly. A lot of
> "akka.pattern.AskTimeoutException: Timed out"
> .
>
> I've tried my experiments in 0.9.0, 0.9.1 and 1.0.0 running on Standalone
> Cluster setup.
> Thanks in advanced
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Interactive-modification-of-DStreams-tp6740.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Reply via email to