guozhangwang commented on pull request #9354: URL: https://github.com/apache/kafka/pull/9354#issuecomment-701557127
> One thing I don't yet understand is, why did this affect all three threads on a single instance (at about the same time)? Was it just because the instance in question didn't have caught-up state for any active tasks, thus it was only assigned stateless tasks across all three threads? > > Also, if the thread is only assigned stateless tasks, then shouldn't it reach RUNNING and therefore start to call `commit` _earlier_ than a thread with some stateful tasks? But we observed that the threads on the problem-instance seem to never rejoin the group at all, right? Is that just another symptom of this bug? From the soak logs what I observed is that, the three threads from that clients stopped making any log entries at different times, roughly 10 mins in between, but the pattern are the same: once they received active tasks that are all stateless, and then the hb thread reported error right after the assignment, the tasks only completes initialization but never completes restoration (note that since they are stateless, they should normally transit to running right after the next iteration). Without lower-level logs I cannot tell for sure, but since the thread-process-rate of those threads indeed drops to zero, my suspicion is that the hb error did not set the consumer's to re-join, and the poll call maybe blocked on pollForHeartbeat since the state is set to `UNJOINED` which cause its poll timeout to MAX_VALUES. That being said, I cannot comfortable say with 100 percent confidence that this is "the" root cause of what we observed in the soaking cluster, but at least it is "an" issue that I can discover from the logs. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
