[
https://issues.apache.org/jira/browse/SPARK-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240496#comment-14240496
]
Hari Shreedharan commented on SPARK-4707:
-----------------------------------------
TD and I discussed this and decided that the second option can be implemented
with only a limited number of retries, which can make it quite readable and
less complex.
> Reliable Kafka Receiver can lose data if the block generator fails to store
> data
> --------------------------------------------------------------------------------
>
> Key: SPARK-4707
> URL: https://issues.apache.org/jira/browse/SPARK-4707
> Project: Spark
> Issue Type: Bug
> Components: Streaming
> Affects Versions: 1.2.0
> Reporter: Hari Shreedharan
> Priority: Critical
>
> The Reliable Kafka Receiver commits offsets only when events are actually
> stored, which ensures that on restart we will actually start where we left
> off. But if the failure happens in the store() call, and the block generator
> reports an error the receiver does not do anything and will continue reading
> from the current offset and not the last commit. This means that messages
> between the last commit and the current offset will be lost.
> I will send a PR for this soon - I have a patch which needs some minor fixes,
> which I need to test.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]