[
https://issues.apache.org/jira/browse/SPARK-23685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabor Somogyi resolved SPARK-23685.
-----------------------------------
Resolution: Information Provided
> Spark Structured Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive
> Offsets (i.e. Log Compaction)
> ---------------------------------------------------------------------------------------------------------
>
> Key: SPARK-23685
> URL: https://issues.apache.org/jira/browse/SPARK-23685
> Project: Spark
> Issue Type: Bug
> Components: Structured Streaming
> Affects Versions: 2.2.0
> Reporter: sirisha
> Priority: Major
>
> When Kafka does log compaction offsets often end up with gaps, meaning the
> next requested offset will be frequently not be offset+1. The logic in
> KafkaSourceRDD & CachedKafkaConsumer assumes that the next offset will always
> be just an increment of 1 .If not, it throws the below exception:
>
> "Cannot fetch records in [5589, 5693) (GroupId: XXX, TopicPartition:XXXX).
> Some data may have been lost because they are not available in Kafka any
> more; either the data was aged out by Kafka or the topic may have been
> deleted before all the data in the topic was processed. If you don't want
> your streaming query to fail on such cases, set the source option
> "failOnDataLoss" to "false". "
>
> FYI: This bug is related to https://issues.apache.org/jira/browse/SPARK-17147
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]