Russell Jurney created SPARK-18955:
--------------------------------------

             Summary: Add ability to emit kafka events to DStream or 
KafkaDStream
                 Key: SPARK-18955
                 URL: https://issues.apache.org/jira/browse/SPARK-18955
             Project: Spark
          Issue Type: New Feature
          Components: DStreams, PySpark
    Affects Versions: 2.0.2
            Reporter: Russell Jurney


Any I/O that needs doing in Spark Streaming seems to have to be done in a 
DStream.foreachRDD loop. For instance, in PySpark if I want to emit Kafka 
events for each record... I have to DStream.foreachRDD and use kafka-python to 
emit a Kafka event for each record.

This really seems like I/O like this should be part of the pyspark.streaming or 
pyspark.streaming.kafka API and the equivalent Scala APIs. Something like 
DStream.emitKafkaEvents or KafkaDStream.emitKafkaEvents would seem to make 
sense.

If this is a good idea, and it seems feasible, I'd like to take a crack at it 
as my first patch for Spark. Advice would be appreciated. What would need to be 
modified to make this happen?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to