Sudarshan Kadambi created SPARK-10320:
-----------------------------------------

             Summary: Support new topic subscriptions without requiring restart 
of the streaming context
                 Key: SPARK-10320
                 URL: https://issues.apache.org/jira/browse/SPARK-10320
             Project: Spark
          Issue Type: New Feature
          Components: Streaming
            Reporter: Sudarshan Kadambi


Spark Streaming lacks the ability to subscribe to newer topics or unsubscribe 
to current ones once the streaming context has been started. Restarting the 
streaming context increases the latency of update handling.

Consider a streaming application subscribed to n topics. Let's say 1 of the 
topics is no longer needed in streaming analytics and hence should be dropped. 
We could do this by stopping the streaming context, removing that topic from 
the topic list and restarting the streaming context. Since with some DStreams 
such as DirectKafkaStream, the per-partition offsets are maintained by Spark, 
we should be able to resume uninterrupted (I think?) from where we left off 
with a minor delay. However, in instances where expensive state initialization 
(from an external datastore) may be needed for datasets published to all 
topics, before streaming updates can be applied to it, it is more convenient to 
only subscribe or unsubcribe to the incremental changes to the topic list. 
Without such a feature, updates go unprocessed for longer than they need to be 
affecting QoS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to