[ 
https://issues.apache.org/jira/browse/SPARK-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6714:
-----------------------------------

    Assignee:     (was: Apache Spark)

> additionally overload KafkaUtils.createDirectStream for using a 
> messageHandler without having to specify the offsets
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-6714
>                 URL: https://issues.apache.org/jira/browse/SPARK-6714
>             Project: Spark
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 1.3.0
>            Reporter: Juan Rodríguez Hortalá
>            Priority: Trivial
>              Labels: kafka, streaming
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Currently, in the Scala API, KafkaUtils.createDirectStream has two overload 
> methods, one for an "easy mode" where only the Kafka parameters and topics 
> are specified, and other "hard mode" where we can specify the offsets and a 
> messageHandler for manipulating the MessageAndMetadata values obtained from 
> Kafka. I think an intermediate method that automatically handles the offsets, 
> but that allows you to specify the messageHandler would be very useful. For 
> example the triple (topic, partition, offset) uniquely identifies each 
> message, so that could be useful as primary key for idempotent insertions in 
> an external database. Also, in an implementation of a lambda architecture, 
> offsets could be used to trace which part of a kafka topic has been covered 
> by the batch view, and which part by the real-time / live view. For both 
> cases I think that having access to the meta information through a 
> messageHandler, while maintaining managed offsets, would be interesting
> A proposal for the implementation is available at 
> https://github.com/juanrh/spark/blob/master/external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala.
>  A new overload of KafkaUtils.createDirectStream for the Scala API, and 
> another for the Java API, have been added



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to