[
https://issues.apache.org/jira/browse/SPARK-10320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004052#comment-15004052
]
Cody Koeninger commented on SPARK-10320:
----------------------------------------
This is for practical purposes blocked on SPARK-10963
> Kafka Support new topic subscriptions without requiring restart of the
> streaming context
> ----------------------------------------------------------------------------------------
>
> Key: SPARK-10320
> URL: https://issues.apache.org/jira/browse/SPARK-10320
> Project: Spark
> Issue Type: New Feature
> Components: Streaming
> Reporter: Sudarshan Kadambi
>
> Spark Streaming lacks the ability to subscribe to newer topics or unsubscribe
> to current ones once the streaming context has been started. Restarting the
> streaming context increases the latency of update handling.
> Consider a streaming application subscribed to n topics. Let's say 1 of the
> topics is no longer needed in streaming analytics and hence should be
> dropped. We could do this by stopping the streaming context, removing that
> topic from the topic list and restarting the streaming context. Since with
> some DStreams such as DirectKafkaStream, the per-partition offsets are
> maintained by Spark, we should be able to resume uninterrupted (I think?)
> from where we left off with a minor delay. However, in instances where
> expensive state initialization (from an external datastore) may be needed for
> datasets published to all topics, before streaming updates can be applied to
> it, it is more convenient to only subscribe or unsubcribe to the incremental
> changes to the topic list. Without such a feature, updates go unprocessed for
> longer than they need to be, thus affecting QoS.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]