[ https://issues.apache.org/jira/browse/KAFKA-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Varsha Abhinandan updated KAFKA-10313: -------------------------------------- Priority: Critical (was: Major) > Out of range offset errors leading to offset reset > -------------------------------------------------- > > Key: KAFKA-10313 > URL: https://issues.apache.org/jira/browse/KAFKA-10313 > Project: Kafka > Issue Type: Bug > Components: consumer > Affects Versions: 2.2.2 > Reporter: Varsha Abhinandan > Priority: Critical > > Hi, > > We have been occasionally noticing offset resets happening on the Kafka > consumer because of offset out of range error. However, I don't see any > errors in the broker logs. No logs related to leader-election, replica lag, > Kafka broker pod restarts or anything. (just info logs were enabled in the > prod environment). > > It appeared from the logs that the out of range error was because of the > fetch offset being larger than the offset range on the broker. Noticed this > happening multiple times on different consumers, stream apps in the prod > environment. So, it doesn't seem like an application bug and more like a bug > in the KafkaConsumer. Would like to understand the cause for such errors. > > Also, none of the offset reset options are desirable. Choosing "earliest" > creates a sudden huge lag (we have a retention of 24hours) and choosing > "latest" leads to data loss (the records produced between the out of range > error and when offset reset happens on the consumer). So, wondering if it is > better for the Kafka client to separate out 'auto.offset.reset' config for > just offset not found. For, out of range error maybe the Kafka client can > automatically reset the offset to latest if the fetch offset is higher to > prevent data loss. Also, automatically reset it to earliest if the fetch > offset is lesser than the start offset. > > > Following are the logs on the consumer side : > > {noformat} > [2020-07-17T08:46:00,322Z] [INFO ] [pipeline-thread-12 > ([prd453-19-event-upsert]-bo-pipeline-12)] > [o.a.k.c.consumer.internals.Fetcher] [Consumer > clientId=bo-indexer-group-prd453-19-on-c19-bo-indexer-upsert-blue-5d665bcbb7-dnvkh-pid-1-kafka-message-source-id-544, > groupId=bo-indexer-group-prd453-19] Fetch offset 476383711 is out of range > for partition prd453-19-event-upsert-32, resetting offset > [2020-07-17T08:46:00,330Z] [INFO ] [pipeline-thread-12 > ([prd453-19-event-upsert]-bo-pipeline-12)] > [o.a.k.c.consumer.internals.Fetcher] [Consumer > clientId=bo-indexer-group-prd453-19-on-c19-bo-indexer-upsert-blue-5d665bcbb7-dnvkh-pid-1-kafka-message-source-id-544, > groupId=bo-indexer-group-prd453-19] Resetting offset for partition > prd453-19-event-upsert-32 to offset 453223789. > {noformat} > Broker logs for the partition : > {noformat} > [2020-07-17T07:40:12,082Z] [INFO ] [kafka-scheduler-4] [kafka.log.Log] > [Log partition=prd453-19-event-upsert-32, dir=/data/kafka] Found deletable > segments with base offsets [452091893] due to retention time 86400000ms breach > [2020-07-17T07:40:12,082Z] [INFO ] [kafka-scheduler-4] [kafka.log.Log] > [Log partition=prd453-19-event-upsert-32, dir=/data/kafka] Scheduling log > segment [baseOffset 452091893, size 1073741693] for deletion. > [2020-07-17T07:40:12,083Z] [INFO ] [kafka-scheduler-4] [kafka.log.Log] > [Log partition=prd453-19-event-upsert-32, dir=/data/kafka] Incrementing log > start offset to 453223789 > [2020-07-17T07:41:12,083Z] [INFO ] [kafka-scheduler-7] [kafka.log.Log] > [Log partition=prd453-19-event-upsert-32, dir=/data/kafka] Deleting segment > 452091893 > [2020-07-17T07:41:12,114Z] [INFO ] [kafka-scheduler-7] > [kafka.log.LogSegment] Deleted log > /data/kafka/prd453-19-event-upsert-32/00000000000452091893.log.deleted. > [2020-07-17T07:41:12,114Z] [INFO ] [kafka-scheduler-7] > [kafka.log.LogSegment] Deleted offset index > /data/kafka/prd453-19-event-upsert-32/00000000000452091893.index.deleted. > [2020-07-17T07:41:12,114Z] [INFO ] [kafka-scheduler-7] > [kafka.log.LogSegment] Deleted time index > /data/kafka/prd453-19-event-upsert-32/00000000000452091893.timeindex.deleted. > [2020-07-17T07:52:31,836Z] [INFO ] [data-plane-kafka-request-handler-3] > [kafka.log.ProducerStateManager] [ProducerStateManager > partition=prd453-19-event-upsert-32] Writing producer snapshot at offset > 475609786 > [2020-07-17T07:52:31,836Z] [INFO ] [data-plane-kafka-request-handler-3] > [kafka.log.Log] [Log partition=prd453-19-event-upsert-32, dir=/data/kafka] > Rolled new log segment at offset 475609786 in 1 ms.{noformat} > > {noformat} > [2020-07-17T09:05:12,075Z] [INFO ] [kafka-scheduler-2] [kafka.log.Log] > [Log partition=prd453-19-event-upsert-32, dir=/data/kafka] Found deletable > segments with base offsets [453223789] due to retention time 86400000ms breach > [2020-07-17T09:05:12,075Z] [INFO ] [kafka-scheduler-2] [kafka.log.Log] > [Log partition=prd453-19-event-upsert-32, dir=/data/kafka] Scheduling log > segment [baseOffset 453223789, size 1073741355] for deletion. > [2020-07-17T09:05:12,075Z] [INFO ] [kafka-scheduler-2] [kafka.log.Log] > [Log partition=prd453-19-event-upsert-32, dir=/data/kafka] Incrementing log > start offset to 454388428 > [2020-07-17T09:06:12,075Z] [INFO ] [kafka-scheduler-6] [kafka.log.Log] > [Log partition=prd453-19-event-upsert-32, dir=/data/kafka] Deleting segment > 453223789 > [2020-07-17T09:06:12,108Z] [INFO ] [kafka-scheduler-6] > [kafka.log.LogSegment] Deleted log > /data/kafka/prd453-19-event-upsert-32/00000000000453223789.log.deleted. > [2020-07-17T09:06:12,108Z] [INFO ] [kafka-scheduler-6] > [kafka.log.LogSegment] Deleted offset index > /data/kafka/prd453-19-event-upsert-32/00000000000453223789.index.deleted. > [2020-07-17T09:06:12,108Z] [INFO ] [kafka-scheduler-6] > [kafka.log.LogSegment] Deleted time index > /data/kafka/prd453-19-event-upsert-32/00000000000453223789.timeindex.deleted. > [2020-07-17T09:06:56,682Z] [INFO ] [data-plane-kafka-request-handler-1] > [kafka.log.ProducerStateManager] [ProducerStateManager > partition=prd453-19-event-upsert-32] Writing producer snapshot at offset > 476681303 > [2020-07-17T09:06:56,683Z] [INFO ] [data-plane-kafka-request-handler-1] > [kafka.log.Log] [Log partition=prd453-19-event-upsert-32, dir=/data/kafka] > Rolled new log segment at offset 476681303 in 1 ms. > {noformat} > > Kafka version : 2.2.2 > Number of brokers : 9 > Log retention of the topic : 24 hours > Few broker configs : > min.insync.replicas = 1 > default.replication.factor = 3 > unclean.leader.election.enable = false > auto.leader.rebalance.enable = true > log.cleaner.delete.retention.ms = 86400000 > log.cleaner.enable = true > log.retention.hours = 24 > replica.high.watermark.checkpoint.interval.ms = 5000 > replica.lag.time.max.ms = 25000 > -- This message was sent by Atlassian Jira (v8.3.4#803005)