[ 
https://issues.apache.org/jira/browse/SPARK-24720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534838#comment-16534838
 ] 

Quentin Ambard commented on SPARK-24720:
----------------------------------------

I'm not sure how we could implement a robust solution. At first I was thinking 
to skip the offset if last offset(s) don't have data, but as you said it could 
be data loss. Maybe we could change the offset range before starting to consume 
the partition: we get offset [0 2[, we check if 2 is available, if not we try 
to find the previous offset with a record, so in this case we would change the 
range from [0 2[ to [0 1[ ? 

> kafka transaction creates Non-consecutive Offsets (due to transaction offset) 
> making streaming fail when failOnDataLoss=true
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-24720
>                 URL: https://issues.apache.org/jira/browse/SPARK-24720
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.3.1
>            Reporter: Quentin Ambard
>            Priority: Major
>
> When kafka transactions are used, sending 1 message to kafka will result to 1 
> offset for the data + 1 offset to mark the transaction.
> When kafka connector for spark streaming read a topic with non-consecutive 
> offset, it leads to a failure. SPARK-17147 fixed this issue for compacted 
> topics.
>  However, SPARK-17147 doesn't fix this issue for kafka transactions: if 1 
> message + 1 transaction commit are in a partition, spark will try to read 
> offsets  [0 2[. offset 0 (containing the message) will be read, but offset 1 
> won't return a value and buffer.hasNext() will be false even after a poll 
> since no data are present for offset 1 (it's the transaction commit)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to