[ 
https://issues.apache.org/jira/browse/SPARK-23685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16400755#comment-16400755
 ] 

Apache Spark commented on SPARK-23685:
--------------------------------------

User 'sirishaSindri' has created a pull request for this issue:
https://github.com/apache/spark/pull/20836

> Spark Structured Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive 
> Offsets (i.e. Log Compaction)
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-23685
>                 URL: https://issues.apache.org/jira/browse/SPARK-23685
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.2.0
>            Reporter: sirisha
>            Priority: Major
>
> When Kafka does log compaction offsets often end up with gaps, meaning the 
> next requested offset will be frequently not be offset+1. The logic in 
> KafkaSourceRDD & CachedKafkaConsumer assumes that the next offset will always 
> be just an increment of 1 .If not, it throws the below exception:
>  
> "Cannot fetch records in [5589, 5693) (GroupId: XXX, TopicPartition:XXXX). 
> Some data may have been lost because they are not available in Kafka any 
> more; either the data was aged out by Kafka or the topic may have been 
> deleted before all the data in the topic was processed. If you don't want 
> your streaming query to fail on such cases, set the source option 
> "failOnDataLoss" to "false". "
>  
> FYI: This bug is related to https://issues.apache.org/jira/browse/SPARK-17147
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to