[
https://issues.apache.org/jira/browse/SPARK-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290747#comment-14290747
]
Cody Koeninger commented on SPARK-4964:
---------------------------------------
Design doc at
https://docs.google.com/a/databricks.com/document/d/1IuvZhg9cOueTf1mq4qwc1fhPb5FVcaRLcyjrtG4XU1k/edit?usp=sharing
> Exactly-once semantics for Kafka
> --------------------------------
>
> Key: SPARK-4964
> URL: https://issues.apache.org/jira/browse/SPARK-4964
> Project: Spark
> Issue Type: Improvement
> Components: Streaming
> Reporter: Cody Koeninger
>
> for background, see
> http://apache-spark-developers-list.1001551.n3.nabble.com/Which-committers-care-about-Kafka-td9827.html
> Requirements:
> - allow client code to implement exactly-once end-to-end semantics for Kafka
> messages, in cases where their output storage is either idempotent or
> transactional
> - allow client code access to Kafka offsets, rather than automatically
> committing them
> - do not assume Zookeeper as a repository for offsets (for the transactional
> case, offsets need to be stored in the same store as the data)
> - allow failure recovery without lost or duplicated messages, even in cases
> where a checkpoint cannot be restored (for instance, because code must be
> updated)
> Design:
> The basic idea is to make an rdd where each partition corresponds to a given
> Kafka topic, partition, starting offset, and ending offset. That allows for
> deterministic replay of data from Kafka (as long as there is enough log
> retention).
> Client code is responsible for committing offsets, either transactionally to
> the same store that data is being written to, or in the case of idempotent
> data, after data has been written.
> PR of a sample implementation for both the batch and dstream case is
> forthcoming.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]