[ https://issues.apache.org/jira/browse/KAFKA-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17340494#comment-17340494 ]
Salvador Germán Aguallo commented on KAFKA-5381: ------------------------------------------------ [~mjuarez] did you ever find a solution for that issue? > ERROR Uncaught exception in scheduled task 'delete-expired-consumer-offsets' > (kafka.utils.KafkaScheduler) > --------------------------------------------------------------------------------------------------------- > > Key: KAFKA-5381 > URL: https://issues.apache.org/jira/browse/KAFKA-5381 > Project: Kafka > Issue Type: Bug > Reporter: Jeff Widman > Priority: Major > > We have a 6 node cluster of 0.10.0.1 brokers. Broker 4 had a hardware > problem, so we re-assigned all its partitions to other brokers. We > immediately started observing the error described in KAFKA-4362 from several > of our consumers. > However, on broker 6, we also started seeing the following exceptions in > {{KafkaScheduler}} which have a somewhat similar-looking traceback: > {code} > [2017-06-03 17:23:57,926] ERROR Uncaught exception in scheduled task > 'delete-expired-consumer-offsets' (kafka.utils.KafkaScheduler) > java.lang.IllegalArgumentException: Message format version for partition 50 > not found > at > kafka.coordinator.GroupMetadataManager$$anonfun$14.apply(GroupMetadataManager.scala:633) > at > kafka.coordinator.GroupMetadataManager$$anonfun$14.apply(GroupMetadataManager.scala:633) > at scala.Option.getOrElse(Option.scala:121) > at > kafka.coordinator.GroupMetadataManager.kafka$coordinator$GroupMetadataManager$$getMessageFormatVersionAndTimestamp(GroupMetadataManager.scala:632) > at > kafka.coordinator.GroupMetadataManager$$anonfun$2$$anonfun$10.apply(GroupMetadataManager.scala:560) > at > kafka.coordinator.GroupMetadataManager$$anonfun$2$$anonfun$10.apply(GroupMetadataManager.scala:551) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:381) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:285) > at > kafka.coordinator.GroupMetadataManager$$anonfun$2.apply$mcI$sp(GroupMetadataManager.scala:551) > at > kafka.coordinator.GroupMetadataManager$$anonfun$2.apply(GroupMetadataManager.scala:543) > at > kafka.coordinator.GroupMetadataManager$$anonfun$2.apply(GroupMetadataManager.scala:543) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231) > at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:239) > at > kafka.coordinator.GroupMetadataManager.kafka$coordinator$GroupMetadataManager$$deleteExpiredOffsets(GroupMetadataManager.scala:543) > at > kafka.coordinator.GroupMetadataManager$$anonfun$1.apply$mcV$sp(GroupMetadataManager.scala:87) > at > kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110) > at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:56) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > {code} > Unsurprisingly, the error disappeared once {{offsets.retention.minutes}} > passed. > This appears to be similar root cause to KAFKA-4362 where > {{GroupMetadataManager.getMessageFormatVersionAndTimestamp}} is throwing the > error due to the offset partition being moved, but I'm unclear whether the > fix for that version also fixed the {{KafkaScheduler}} or if more work needs > to be done here. > We did the partition re-assignment by using the > {{kafka-reassign-partitions.sh}} script and giving it the five healthy > brokers. From my understanding, this would have randomly re-assigned all > partitions (I don't think its sticky), so probably at least one partition > from the {{__consumer_offsets}} topic was removed from broker 6. However, if > that was the case, I would have expected all brokers to have had these > partitions removed and be throwing this error. But our logging infrastructure > shows that this error was only happening on broker 6, not on the other > brokers. Not sure why that is. -- This message was sent by Atlassian Jira (v8.3.4#803005)