Bhaskar E created SPARK-23017:
---------------------------------

             Summary: Why would spark-kafka stream fail stating `Got wrong 
record for <groupid> <topic> <partition> even after seeking to offset #` when 
using kafka API to commit offset
                 Key: SPARK-23017
                 URL: https://issues.apache.org/jira/browse/SPARK-23017
             Project: Spark
          Issue Type: Question
          Components: Structured Streaming
    Affects Versions: 2.2.1
            Reporter: Bhaskar E
            Priority: Minor


My spark-kafka streaming job started failing after multiple messages stating - 
`Got wrong record for <groupid> <topic> <partition> even after seeking to 
offset # `.

I disabled `enable.auto.commit` and saving the commits (to kafka itself) 
manually using kafka API
{code}((CanCommitOffsets) 
messages.inputDStream()).commitAsync(offsetRanges.get());{code}

When I'm manually commit offsets to kafka and my job resumes requesting (kafka) 
(say after 1 hr recovering from some failure) for data then kafka should send 
the next available offsets (from last committed offset). 
So, when I'm using kafka itself to store my committed offsets then my spark job 
clearly doesn't know what's the next offset to request. But, here in the error 
message it states that it `Got wrong record ....  even after seeking to a 
particular offset #`. **So, how is this possible?**

If I assume that the spark-driver gets some offsets ahead from kafka (before 
initially reading the actual records) and then start requesting for the offsets 
even then it's confusing how could spark receive wrong offset when it is 
requesting for the offsets which it got from kafka itself in the first place?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to