[ https://issues.apache.org/jira/browse/KAFKA-12544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
kevin j staiger updated KAFKA-12544: ------------------------------------ Description: Hi, we are experiencing a strange issue with a kafka topic where is seems like a particular consumer gets stuck in a bad state, we're running an 8 pod kubernetes cluster with 2 threads and 16 partitions, things run smoothly for awhile and then one of the pods (with 2 consumers and 2 partitions) will become very intermittent in its read rate and partition lag will spike. Eventually all of the pods switch from reading at a steady rate to this spike intermittent rate. the cpu on the pods seems normal and the byte rate of the events seems fine, any idea why certain consumers can get into this state where there seem to gaps of 0 operations happening and the lag continually increases? please let me know if you need anymore details thanks (was: Hi, we are experiencing a strange issue with a kafka topic where is seems like a particular consumer gets stuck in a bad state, we're running an 8 pod kubernetes cluster with 2 threads and 16 partitions, things run smoothly for awhile and then one of the pods (with 2 consumers and 2 partitions) will become very intermittent in its read rate and partition lag will spike. Eventually all of the pods switch from reading at a steady rate to this spike intermittent rate. the cpu on the pods seems normal and the byte rate of the events seems fine, any idea why certain consumers can get into this state where there seem to gaps of 0 operations happening and the lag continually increases? thanks) > Particular partitions lagging and consumers intermittently reading > ------------------------------------------------------------------ > > Key: KAFKA-12544 > URL: https://issues.apache.org/jira/browse/KAFKA-12544 > Project: Kafka > Issue Type: Bug > Components: consumer > Environment: production > Reporter: kevin j staiger > Priority: Major > Attachments: Screen Shot 2021-03-17 at 4.04.50 PM.png, Screen Shot > 2021-03-23 at 8.42.13 PM.png > > > Hi, we are experiencing a strange issue with a kafka topic where is seems > like a particular consumer gets stuck in a bad state, we're running an 8 pod > kubernetes cluster with 2 threads and 16 partitions, things run smoothly for > awhile and then one of the pods (with 2 consumers and 2 partitions) will > become very intermittent in its read rate and partition lag will spike. > Eventually all of the pods switch from reading at a steady rate to this spike > intermittent rate. the cpu on the pods seems normal and the byte rate of the > events seems fine, any idea why certain consumers can get into this state > where there seem to gaps of 0 operations happening and the lag continually > increases? please let me know if you need anymore details thanks -- This message was sent by Atlassian Jira (v8.3.4#803005)