[
https://issues.apache.org/jira/browse/SPARK-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258668#comment-14258668
]
Apache Spark commented on SPARK-4964:
-------------------------------------
User 'koeninger' has created a pull request for this issue:
https://github.com/apache/spark/pull/3798
> Exactly-once semantics for Kafka
> --------------------------------
>
> Key: SPARK-4964
> URL: https://issues.apache.org/jira/browse/SPARK-4964
> Project: Spark
> Issue Type: Improvement
> Components: Streaming
> Reporter: Cody Koeninger
>
> for background, see
> http://apache-spark-developers-list.1001551.n3.nabble.com/Which-committers-care-about-Kafka-td9827.html
> Requirements:
> - allow client code to implement exactly-once end-to-end semantics for Kafka
> messages, in cases where their output storage is either idempotent or
> transactional
> - allow client code access to Kafka offsets, rather than automatically
> committing them
> - do not assume Zookeeper as a repository for offsets (for the transactional
> case, offsets need to be stored in the same store as the data)
> - allow failure recovery without lost or duplicated messages, even in cases
> where a checkpoint cannot be restored (for instance, because code must be
> updated)
> Design:
> The basic idea is to make an rdd where each partition corresponds to a given
> Kafka topic, partition, starting offset, and ending offset. That allows for
> deterministic replay of data from Kafka (as long as there is enough log
> retention).
> Client code is responsible for committing offsets, either transactionally to
> the same store that data is being written to, or in the case of idempotent
> data, after data has been written.
> PR of a sample implementation for both the batch and dstream case is
> forthcoming.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]