[
https://issues.apache.org/jira/browse/SPARK-34187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
L. C. Hsieh updated SPARK-34187:
--------------------------------
Description:
We support non-consecutive offsets for Kafka since 2.4.0. In `fetchRecord`, we
do offset validation by checking if the offset is in available offset range.
But currently we obtain latest available offset range to do the check. It looks
not correct as the available offset range could be changed during the batch, so
the available offset range is different than the one when we polling the
records from Kafka.
It is possible that an offset is valid when polling, but at the time we do the
above check, it is out of latest available offset range. We will wrongly
consider it as data loss case and fail the query or drop the record.
> Use available offset range obtainted during polling when checking offset
> validation
> -----------------------------------------------------------------------------------
>
> Key: SPARK-34187
> URL: https://issues.apache.org/jira/browse/SPARK-34187
> Project: Spark
> Issue Type: Bug
> Components: Structured Streaming
> Affects Versions: 2.4.7, 3.0.1, 3.1.0
> Reporter: L. C. Hsieh
> Assignee: L. C. Hsieh
> Priority: Major
>
> We support non-consecutive offsets for Kafka since 2.4.0. In `fetchRecord`,
> we do offset validation by checking if the offset is in available offset
> range. But currently we obtain latest available offset range to do the check.
> It looks not correct as the available offset range could be changed during
> the batch, so the available offset range is different than the one when we
> polling the records from Kafka.
> It is possible that an offset is valid when polling, but at the time we do
> the above check, it is out of latest available offset range. We will wrongly
> consider it as data loss case and fail the query or drop the record.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]