[ 
https://issues.apache.org/jira/browse/KAFKA-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935998#comment-16935998
 ] 

ASF GitHub Bot commented on KAFKA-8896:
---------------------------------------

abbccdda commented on pull request #7377: KAFKA-8896: Check group state before 
completing delayed heartbeat
URL: https://github.com/apache/kafka/pull/7377
 
 
   This PR is a defensive fix for reported bug 8896, which would cause group 
coordinator crash when the heartbeat member was not found.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> NoSuchElementException after coordinator move
> ---------------------------------------------
>
>                 Key: KAFKA-8896
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8896
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jason Gustafson
>            Assignee: Boyang Chen
>            Priority: Major
>
> Caught this exception in the wild:
> {code:java}
> java.util.NoSuchElementException: key not found: 
> consumer-group-38981ebe-4361-44e7-b710-7d11f5d35639
>       at scala.collection.MapLike.default(MapLike.scala:235)
>       at scala.collection.MapLike.default$(MapLike.scala:234)
>       at scala.collection.AbstractMap.default(Map.scala:63)
>       at scala.collection.mutable.HashMap.apply(HashMap.scala:69)
>       at kafka.coordinator.group.GroupMetadata.get(GroupMetadata.scala:214)
>       at 
> kafka.coordinator.group.GroupCoordinator.$anonfun$tryCompleteHeartbeat$1(GroupCoordinator.scala:1008)
>       at 
> scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
>       at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
>       at kafka.coordinator.group.GroupMetadata.inLock(GroupMetadata.scala:209)
>       at 
> kafka.coordinator.group.GroupCoordinator.tryCompleteHeartbeat(GroupCoordinator.scala:1001)
>       at 
> kafka.coordinator.group.DelayedHeartbeat.tryComplete(DelayedHeartbeat.scala:34)
>       at 
> kafka.server.DelayedOperation.maybeTryComplete(DelayedOperation.scala:122)
>       at 
> kafka.server.DelayedOperationPurgatory$Watchers.tryCompleteWatched(DelayedOperation.scala:391)
>       at 
> kafka.server.DelayedOperationPurgatory.checkAndComplete(DelayedOperation.scala:295)
>       at 
> kafka.coordinator.group.GroupCoordinator.completeAndScheduleNextExpiration(GroupCoordinator.scala:802)
>       at 
> kafka.coordinator.group.GroupCoordinator.completeAndScheduleNextHeartbeatExpiration(GroupCoordinator.scala:795)
>       at 
> kafka.coordinator.group.GroupCoordinator.$anonfun$handleHeartbeat$2(GroupCoordinator.scala:543)
>       at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>       at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
>       at kafka.coordinator.group.GroupMetadata.inLock(GroupMetadata.scala:209)
>       at 
> kafka.coordinator.group.GroupCoordinator.handleHeartbeat(GroupCoordinator.scala:516)
>       at kafka.server.KafkaApis.handleHeartbeatRequest(KafkaApis.scala:1617)
>       at kafka.server.KafkaApis.handle(KafkaApis.scala:155) {code}
>  
> Looking at the logs, I see a coordinator change just prior to this exception. 
> The group was first unloaded as the coordinator moved to another broker and 
> then was loaded again as the coordinator was moved back. I am guessing that 
> somehow the delayed heartbeat is retaining the reference to the old 
> GroupMetadata instance. Not sure exactly how this can happen though.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to