[
https://issues.apache.org/jira/browse/SPARK-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-6714:
-----------------------------------
Assignee: (was: Apache Spark)
> additionally overload KafkaUtils.createDirectStream for using a
> messageHandler without having to specify the offsets
> --------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-6714
> URL: https://issues.apache.org/jira/browse/SPARK-6714
> Project: Spark
> Issue Type: Improvement
> Components: Streaming
> Affects Versions: 1.3.0
> Reporter: Juan RodrĂguez Hortalá
> Priority: Trivial
> Labels: kafka, streaming
> Original Estimate: 0h
> Remaining Estimate: 0h
>
> Currently, in the Scala API, KafkaUtils.createDirectStream has two overload
> methods, one for an "easy mode" where only the Kafka parameters and topics
> are specified, and other "hard mode" where we can specify the offsets and a
> messageHandler for manipulating the MessageAndMetadata values obtained from
> Kafka. I think an intermediate method that automatically handles the offsets,
> but that allows you to specify the messageHandler would be very useful. For
> example the triple (topic, partition, offset) uniquely identifies each
> message, so that could be useful as primary key for idempotent insertions in
> an external database. Also, in an implementation of a lambda architecture,
> offsets could be used to trace which part of a kafka topic has been covered
> by the batch view, and which part by the real-time / live view. For both
> cases I think that having access to the meta information through a
> messageHandler, while maintaining managed offsets, would be interesting
> A proposal for the implementation is available at
> https://github.com/juanrh/spark/blob/master/external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala.
> A new overload of KafkaUtils.createDirectStream for the Scala API, and
> another for the Java API, have been added
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]