[ https://issues.apache.org/jira/browse/KAFKA-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17078829#comment-17078829 ]
Jason Gustafson commented on KAFKA-9543: ---------------------------------------- I think this is the same issue as KAFKA-9824. I have been trying to reproduce it in a test case, but no luck so far. I found a case which could result in unexpected out of range errors in KAFKA-9835, but I'm not sure that's what we're looking for here given the coincidence of segment rolling which we have now seen in several independent reports. I guess it's at least theoretically possible that we get a sequence like this: 1. Broker accepts append and rolls segment 2. Data is written to new segment 3. Consumer fetches from previous log end and hits KAFKA-9835 which results in receiving uncommitted data. 4. Consumer fetches again from the new log end offset which results in the out of range error 5. Broker updates new log end offset. This would require both KAFKA-9835 to be hit (or some similar error) combined with an edge case like the one that [~brianj] mentioned above. I'm having a hard time accepting this though. In my testing I added an explicit sleep between the segment append and the update of the log end offset and I still couldn't manage to reproduce a sequence like the one above. It's possible I'm missing some detail though. If anyone has a way to reproduce this issue reliably, it would help to have a dump from the segments spanning the log roll. The main thing I want to understand is whether the "out of range" data is on the new segment or the old one. > Consumer offset reset after new segment rolling > ----------------------------------------------- > > Key: KAFKA-9543 > URL: https://issues.apache.org/jira/browse/KAFKA-9543 > Project: Kafka > Issue Type: Bug > Affects Versions: 2.4.0 > Reporter: RafaĆ Boniecki > Priority: Major > Attachments: Untitled.png, image-2020-04-06-17-10-32-636.png > > > After upgrade from kafka 2.1.1 to 2.4.0, I'm experiencing unexpected consumer > offset resets. > Consumer: > {code:java} > 2020-02-12T11:12:58.402+01:00 hostname 4a2a39a35a02 > [2020-02-12T11:12:58,402][INFO > ][org.apache.kafka.clients.consumer.internals.Fetcher] [Consumer > clientId=logstash-1, groupId=logstash] Fetch offset 1632750575 is out of > range for partition stats-5, resetting offset > {code} > Broker: > {code:java} > 2020-02-12 11:12:58:400 CET INFO > [data-plane-kafka-request-handler-1][kafka.log.Log] [Log partition=stats-5, > dir=/kafka4/data] Rolled new log segment at offset 1632750565 in 2 ms.{code} > All resets are perfectly correlated to rolling new segments at the broker - > segment is rolled first, then, couple of ms later, reset on the consumer > occurs. Attached is grafana graph with consumer lag per partition. All sudden > spikes in lag are offset resets due to this bug. -- This message was sent by Atlassian Jira (v8.3.4#803005)