[ 
https://issues.apache.org/jira/browse/KAFKA-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17078829#comment-17078829
 ] 

Jason Gustafson commented on KAFKA-9543:
----------------------------------------

I think this is the same issue as KAFKA-9824. I have been trying to reproduce 
it in a test case, but no luck so far. I found a case which could result in 
unexpected out of range errors in KAFKA-9835, but I'm not sure that's what 
we're looking for here given the coincidence of segment rolling which we have 
now seen in several independent reports. I guess it's at least theoretically 
possible that we get a sequence like this:

1. Broker accepts append and rolls segment
2. Data is written to new segment
3. Consumer fetches from previous log end and hits KAFKA-9835 which results in 
receiving uncommitted data.
4. Consumer fetches again from the new log end offset which results in the out 
of range error
5. Broker updates new log end offset.

This would require both KAFKA-9835 to be hit (or some similar error) combined 
with an edge case like the one that [~brianj] mentioned above. I'm having a 
hard time accepting this though. In my testing I added an explicit sleep 
between the segment append and the update of the log end offset and I still 
couldn't manage to reproduce a sequence like the one above. It's possible I'm 
missing some detail though.

If anyone has a way to reproduce this issue reliably, it would help to have a 
dump from the segments spanning the log roll. The main thing I want to 
understand is whether the "out of range" data is on the new segment or the old 
one.


> Consumer offset reset after new segment rolling
> -----------------------------------------------
>
>                 Key: KAFKA-9543
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9543
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>            Reporter: RafaƂ Boniecki
>            Priority: Major
>         Attachments: Untitled.png, image-2020-04-06-17-10-32-636.png
>
>
> After upgrade from kafka 2.1.1 to 2.4.0, I'm experiencing unexpected consumer 
> offset resets.
> Consumer:
> {code:java}
> 2020-02-12T11:12:58.402+01:00 hostname 4a2a39a35a02 
> [2020-02-12T11:12:58,402][INFO 
> ][org.apache.kafka.clients.consumer.internals.Fetcher] [Consumer 
> clientId=logstash-1, groupId=logstash] Fetch offset 1632750575 is out of 
> range for partition stats-5, resetting offset
> {code}
> Broker:
> {code:java}
> 2020-02-12 11:12:58:400 CET INFO  
> [data-plane-kafka-request-handler-1][kafka.log.Log] [Log partition=stats-5, 
> dir=/kafka4/data] Rolled new log segment at offset 1632750565 in 2 ms.{code}
> All resets are perfectly correlated to rolling new segments at the broker - 
> segment is rolled first, then, couple of ms later, reset on the consumer 
> occurs. Attached is grafana graph with consumer lag per partition. All sudden 
> spikes in lag are offset resets due to this bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to