[ https://issues.apache.org/jira/browse/SPARK-23685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401428#comment-16401428 ]
sirisha commented on SPARK-23685: --------------------------------- [~apachespark] Can anyone please guide me on how to assign this pull request to myself? I do not see an option to assign it to myself. > Spark Structured Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive > Offsets (i.e. Log Compaction) > --------------------------------------------------------------------------------------------------------- > > Key: SPARK-23685 > URL: https://issues.apache.org/jira/browse/SPARK-23685 > Project: Spark > Issue Type: Bug > Components: Structured Streaming > Affects Versions: 2.2.0 > Reporter: sirisha > Priority: Major > > When Kafka does log compaction offsets often end up with gaps, meaning the > next requested offset will be frequently not be offset+1. The logic in > KafkaSourceRDD & CachedKafkaConsumer assumes that the next offset will always > be just an increment of 1 .If not, it throws the below exception: > > "Cannot fetch records in [5589, 5693) (GroupId: XXX, TopicPartition:XXXX). > Some data may have been lost because they are not available in Kafka any > more; either the data was aged out by Kafka or the topic may have been > deleted before all the data in the topic was processed. If you don't want > your streaming query to fail on such cases, set the source option > "failOnDataLoss" to "false". " > > FYI: This bug is related to https://issues.apache.org/jira/browse/SPARK-17147 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org