[
https://issues.apache.org/jira/browse/SPARK-18955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon updated SPARK-18955:
---------------------------------
Labels: bulk-closed features newbie (was: features newbie)
> Add ability to emit kafka events to DStream or KafkaDStream
> -----------------------------------------------------------
>
> Key: SPARK-18955
> URL: https://issues.apache.org/jira/browse/SPARK-18955
> Project: Spark
> Issue Type: New Feature
> Components: DStreams, PySpark
> Affects Versions: 2.0.2
> Reporter: Russell Jurney
> Priority: Major
> Labels: bulk-closed, features, newbie
>
> Any I/O that needs doing in Spark Streaming seems to have to be done in a
> DStream.foreachRDD loop. For instance, in PySpark if I want to emit Kafka
> events for each record... I have to DStream.foreachRDD and use kafka-python
> to emit a Kafka event for each record.
> This really seems like I/O like this should be part of the pyspark.streaming
> or pyspark.streaming.kafka API and the equivalent Scala APIs. Something like
> DStream.emitKafkaEvents or KafkaDStream.emitKafkaEvents would seem to make
> sense.
> If this is a good idea, and it seems feasible, I'd like to take a crack at it
> as my first patch for Spark. Advice would be appreciated. What would need to
> be modified to make this happen?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]