Bhaskar E created SPARK-23017:
---------------------------------
Summary: Why would spark-kafka stream fail stating `Got wrong
record for <groupid> <topic> <partition> even after seeking to offset #` when
using kafka API to commit offset
Key: SPARK-23017
URL: https://issues.apache.org/jira/browse/SPARK-23017
Project: Spark
Issue Type: Question
Components: Structured Streaming
Affects Versions: 2.2.1
Reporter: Bhaskar E
Priority: Minor
My spark-kafka streaming job started failing after multiple messages stating -
`Got wrong record for <groupid> <topic> <partition> even after seeking to
offset # `.
I disabled `enable.auto.commit` and saving the commits (to kafka itself)
manually using kafka API
{code}((CanCommitOffsets)
messages.inputDStream()).commitAsync(offsetRanges.get());{code}
When I'm manually commit offsets to kafka and my job resumes requesting (kafka)
(say after 1 hr recovering from some failure) for data then kafka should send
the next available offsets (from last committed offset).
So, when I'm using kafka itself to store my committed offsets then my spark job
clearly doesn't know what's the next offset to request. But, here in the error
message it states that it `Got wrong record .... even after seeking to a
particular offset #`. **So, how is this possible?**
If I assume that the spark-driver gets some offsets ahead from kafka (before
initially reading the actual records) and then start requesting for the offsets
even then it's confusing how could spark receive wrong offset when it is
requesting for the offsets which it got from kafka itself in the first place?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]