[ https://issues.apache.org/jira/browse/SPARK-23685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16400755#comment-16400755 ]
Apache Spark commented on SPARK-23685: -------------------------------------- User 'sirishaSindri' has created a pull request for this issue: https://github.com/apache/spark/pull/20836 > Spark Structured Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive > Offsets (i.e. Log Compaction) > --------------------------------------------------------------------------------------------------------- > > Key: SPARK-23685 > URL: https://issues.apache.org/jira/browse/SPARK-23685 > Project: Spark > Issue Type: Bug > Components: Structured Streaming > Affects Versions: 2.2.0 > Reporter: sirisha > Priority: Major > > When Kafka does log compaction offsets often end up with gaps, meaning the > next requested offset will be frequently not be offset+1. The logic in > KafkaSourceRDD & CachedKafkaConsumer assumes that the next offset will always > be just an increment of 1 .If not, it throws the below exception: > > "Cannot fetch records in [5589, 5693) (GroupId: XXX, TopicPartition:XXXX). > Some data may have been lost because they are not available in Kafka any > more; either the data was aged out by Kafka or the topic may have been > deleted before all the data in the topic was processed. If you don't want > your streaming query to fail on such cases, set the source option > "failOnDataLoss" to "false". " > > FYI: This bug is related to https://issues.apache.org/jira/browse/SPARK-17147 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org