[ 
https://issues.apache.org/jira/browse/SPARK-10320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717220#comment-14717220
 ] 

Sudarshan Kadambi commented on SPARK-10320:
-------------------------------------------

There is ingest-time analytics (independent, application of transforms over 
data published to individual topics) and query-time analytics (user queries 
which requires joins across RDDs holding the transformed data). However, even 
ingest-time analytics will potentially require joins across data published to 
different topics. For these reasons, this needs to be a single Spark streaming 
application.

> Support new topic subscriptions without requiring restart of the streaming 
> context
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-10320
>                 URL: https://issues.apache.org/jira/browse/SPARK-10320
>             Project: Spark
>          Issue Type: New Feature
>          Components: Streaming
>            Reporter: Sudarshan Kadambi
>
> Spark Streaming lacks the ability to subscribe to newer topics or unsubscribe 
> to current ones once the streaming context has been started. Restarting the 
> streaming context increases the latency of update handling.
> Consider a streaming application subscribed to n topics. Let's say 1 of the 
> topics is no longer needed in streaming analytics and hence should be 
> dropped. We could do this by stopping the streaming context, removing that 
> topic from the topic list and restarting the streaming context. Since with 
> some DStreams such as DirectKafkaStream, the per-partition offsets are 
> maintained by Spark, we should be able to resume uninterrupted (I think?) 
> from where we left off with a minor delay. However, in instances where 
> expensive state initialization (from an external datastore) may be needed for 
> datasets published to all topics, before streaming updates can be applied to 
> it, it is more convenient to only subscribe or unsubcribe to the incremental 
> changes to the topic list. Without such a feature, updates go unprocessed for 
> longer than they need to be, thus affecting QoS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to