[ https://issues.apache.org/jira/browse/KAFKA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17635089#comment-17635089 ]
HaiyuanZhao edited comment on KAFKA-13639 at 11/17/22 6:23 AM: --------------------------------------------------------------- [~ijuma] Hi, We also have seen similar behavior with java 1.8.0_45. In addition, we encountered this issue with very large values, and not just topic __consumer_offset. The exception log is as followed. Hope this can help. {code:java} [2022-11-16 22:54:52,147] ERROR [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=419] Unexpected error occurred while processing data for partition test-topic-test-topic500-0000-2 at offset 7434 (kafka.server.ReplicaFetcherThread) kafka.common.OffsetsOutOfOrderException: Out of order offsets found in append to test-topic-test-topic500-0000-2: ArrayBuffer(4126996017, 4126996018, 4126996019, 4126996020, 4126996021, 4126996022, 4126996023, 4126996024, 4126996025, 4126996026, 4126996027, 4126996028, 4126996029, 4126996030, 4126996031, 4126996032, 4126996033, 4126996034, 4126996035, 4126996036, 4126 996037, 4126996038, 4126996039, 4126996040, 4126996041, 4126996042, 4126996043, 4126996044, 4126996045, 4126996046, 4126996047, 4126996048, 4126996049, 4126996050, 4126996051, 412699605 2, 4126996053, 4126996054, 4126996055, 4126996056, 4126996057, 4126996058, 4126996059, 4126996060, 4126996061, 4126996062, 4126996063, 4126996064, 4126996065, 4126996066, 4126996067, 41 26996068, 4126996069, 4126996070, 4126996071, 4126996072, 4126996073, 4126996074, 4126996075, 4126996076, 4126996077, 4126996078, 4126996079, 4126996080, 4126996081, 4126996082, 4126996 083, 4126996084, 4126996085, 4126996086, 4126996087, 4126996088, 4126996089, 4126996090, 4126996091, 4126996092, 4126996093, 4126996094, 4126996095, 4126996096, 4126996097, 4126996098, 4126996099, 4126996100, 4126996101, 4126996102, 4126996103, 4126996104, 4126996105, 4126996106, 4126996107, 4126996108, 4126996109, 4126996110, 4126996111, 4126996112, 4126996113, 41269 96114, 4126996115, 4126996116, 4126996117, 4126996118, 4126996119, 4126996120, 4126996121, 4126996122, 4126996123, 4126996124, 4126996125, 4126996126, 4126996127, 4126996128, 4126996129 , 4126996130, 4126996131, 4126996132, 4126996133, 4126996134, 4126996135, 4126996136, 4126996137, 4126996138, 4126996139, 4126996140, 4126996141, 4126996142, 4071685845, 4071685846, 407 1685847, 4071685848, 4071685849, 4071685850, 4071685851, 4071685852, 4071685853, 4071685854, 4071685855, 4071685856, 4071685857, 4071685858, 4071685859, 4071685860, 4071685861, 40716858 62, 4071685863, 4071685864, 4071685865, 4071685866, 4071685867, 4071685868, 4071685869, 4071685870, 4071685871, 4071685872, 4071685873, 4071685874, 4071685875, 4071685876, 4071685877, 4 071685878, 4071685879, 4071685880, 4071685881, 4071685882, 4071685883, 4071685884, 4071685885, 4071685886, 4071685887, 4071685888, 4071685889, 4071685890, 4071685891, 4071685892, 407168 5893, 4071685894, 4071685895, 4071685896, 4071685897, 4071685898, 4071685899, 4071685900, 4071685901, 4071685902, 4071685903, 4071685904, 4071685905, 4071685906, 4071685907, 4071685908, 4071685909, 4071685910, 4071685911, 4071685912, 4071685913, 4071685914, 4071685915, 4071685916, 4071685917, 4071685918, 4071685919, 4071685920, 4071685921, 4071685922, 4071685923, 4071 685924, 4071685925, 4071685926, 4071685927, 4071685928, 4071685929, 4071685930, 4071685931, 4071685932, 4071685933, 4071685934, 4071685935, 4071685936, 4071685937, 4071685938, 407168593 9, 4071685940, 4071685941, 4071685942, 4071685943, 4071685944, 4071685945, 4071685946, 4071685947, 4071685948, 4071685949, 4071685950, 4071685951, 4071685952, 4071685953, 4071685954, 40 71685955, 4071685956, 4071685957, 4071685958, 4071685959, 4071685960, 4071685961, 4071685962, 4071685963, 4071685964, 4071685965, 4071685966, 4071685967, 4071685968, 4071685969, 4071685 970) at kafka.log.Log.$anonfun$append$2(Log.scala:1165) at kafka.log.Log.append(Log.scala:2409) at kafka.log.Log.appendAsFollower(Log.scala:1073) at kafka.cluster.Partition.doAppendRecordsToFollowerOrFutureReplica(Partition.scala:1036) at kafka.cluster.Partition.appendRecordsToFollowerOrFutureReplica(Partition.scala:1043) at kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:172) at kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$7(AbstractFetcherThread.scala:339) at kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6(AbstractFetcherThread.scala:326) at kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6$adapted(AbstractFetcherThread.scala:325) at kafka.server.AbstractFetcherThread$$Lambda$1298/1889097333.apply(Unknown Source) at kafka.utils.Implicits$MapExtensionMethods$.$anonfun$forKeyValue$1(Implicits.scala:62) at kafka.server.AbstractFetcherThread$$Lambda$1332/1703195329.apply(Unknown Source) at scala.collection.convert.JavaCollectionWrappers$JMapWrapperLike.foreachEntry(JavaCollectionWrappers.scala:351) at scala.collection.convert.JavaCollectionWrappers$JMapWrapperLike.foreachEntry$(JavaCollectionWrappers.scala:347) at scala.collection.convert.JavaCollectionWrappers$AbstractJMapWrapper.foreachEntry(JavaCollectionWrappers.scala:309) at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:325) at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:135) at kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:134) at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:117) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96) {code} was (Author: zhaohaidao): [~ijuma] Hi, We also have seen similar behavior with java 1.8.0_45. In addition, we encountered this issue very large values, and not just topic __consumer_offset. The exception log is as followed. Hope this can help. {code:java} [2022-11-16 22:54:52,147] ERROR [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=419] Unexpected error occurred while processing data for partition test-topic-test-topic500-0000-2 at offset 7434 (kafka.server.ReplicaFetcherThread) kafka.common.OffsetsOutOfOrderException: Out of order offsets found in append to test-topic-test-topic500-0000-2: ArrayBuffer(4126996017, 4126996018, 4126996019, 4126996020, 4126996021, 4126996022, 4126996023, 4126996024, 4126996025, 4126996026, 4126996027, 4126996028, 4126996029, 4126996030, 4126996031, 4126996032, 4126996033, 4126996034, 4126996035, 4126996036, 4126 996037, 4126996038, 4126996039, 4126996040, 4126996041, 4126996042, 4126996043, 4126996044, 4126996045, 4126996046, 4126996047, 4126996048, 4126996049, 4126996050, 4126996051, 412699605 2, 4126996053, 4126996054, 4126996055, 4126996056, 4126996057, 4126996058, 4126996059, 4126996060, 4126996061, 4126996062, 4126996063, 4126996064, 4126996065, 4126996066, 4126996067, 41 26996068, 4126996069, 4126996070, 4126996071, 4126996072, 4126996073, 4126996074, 4126996075, 4126996076, 4126996077, 4126996078, 4126996079, 4126996080, 4126996081, 4126996082, 4126996 083, 4126996084, 4126996085, 4126996086, 4126996087, 4126996088, 4126996089, 4126996090, 4126996091, 4126996092, 4126996093, 4126996094, 4126996095, 4126996096, 4126996097, 4126996098, 4126996099, 4126996100, 4126996101, 4126996102, 4126996103, 4126996104, 4126996105, 4126996106, 4126996107, 4126996108, 4126996109, 4126996110, 4126996111, 4126996112, 4126996113, 41269 96114, 4126996115, 4126996116, 4126996117, 4126996118, 4126996119, 4126996120, 4126996121, 4126996122, 4126996123, 4126996124, 4126996125, 4126996126, 4126996127, 4126996128, 4126996129 , 4126996130, 4126996131, 4126996132, 4126996133, 4126996134, 4126996135, 4126996136, 4126996137, 4126996138, 4126996139, 4126996140, 4126996141, 4126996142, 4071685845, 4071685846, 407 1685847, 4071685848, 4071685849, 4071685850, 4071685851, 4071685852, 4071685853, 4071685854, 4071685855, 4071685856, 4071685857, 4071685858, 4071685859, 4071685860, 4071685861, 40716858 62, 4071685863, 4071685864, 4071685865, 4071685866, 4071685867, 4071685868, 4071685869, 4071685870, 4071685871, 4071685872, 4071685873, 4071685874, 4071685875, 4071685876, 4071685877, 4 071685878, 4071685879, 4071685880, 4071685881, 4071685882, 4071685883, 4071685884, 4071685885, 4071685886, 4071685887, 4071685888, 4071685889, 4071685890, 4071685891, 4071685892, 407168 5893, 4071685894, 4071685895, 4071685896, 4071685897, 4071685898, 4071685899, 4071685900, 4071685901, 4071685902, 4071685903, 4071685904, 4071685905, 4071685906, 4071685907, 4071685908, 4071685909, 4071685910, 4071685911, 4071685912, 4071685913, 4071685914, 4071685915, 4071685916, 4071685917, 4071685918, 4071685919, 4071685920, 4071685921, 4071685922, 4071685923, 4071 685924, 4071685925, 4071685926, 4071685927, 4071685928, 4071685929, 4071685930, 4071685931, 4071685932, 4071685933, 4071685934, 4071685935, 4071685936, 4071685937, 4071685938, 407168593 9, 4071685940, 4071685941, 4071685942, 4071685943, 4071685944, 4071685945, 4071685946, 4071685947, 4071685948, 4071685949, 4071685950, 4071685951, 4071685952, 4071685953, 4071685954, 40 71685955, 4071685956, 4071685957, 4071685958, 4071685959, 4071685960, 4071685961, 4071685962, 4071685963, 4071685964, 4071685965, 4071685966, 4071685967, 4071685968, 4071685969, 4071685 970) at kafka.log.Log.$anonfun$append$2(Log.scala:1165) at kafka.log.Log.append(Log.scala:2409) at kafka.log.Log.appendAsFollower(Log.scala:1073) at kafka.cluster.Partition.doAppendRecordsToFollowerOrFutureReplica(Partition.scala:1036) at kafka.cluster.Partition.appendRecordsToFollowerOrFutureReplica(Partition.scala:1043) at kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:172) at kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$7(AbstractFetcherThread.scala:339) at kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6(AbstractFetcherThread.scala:326) at kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6$adapted(AbstractFetcherThread.scala:325) at kafka.server.AbstractFetcherThread$$Lambda$1298/1889097333.apply(Unknown Source) at kafka.utils.Implicits$MapExtensionMethods$.$anonfun$forKeyValue$1(Implicits.scala:62) at kafka.server.AbstractFetcherThread$$Lambda$1332/1703195329.apply(Unknown Source) at scala.collection.convert.JavaCollectionWrappers$JMapWrapperLike.foreachEntry(JavaCollectionWrappers.scala:351) at scala.collection.convert.JavaCollectionWrappers$JMapWrapperLike.foreachEntry$(JavaCollectionWrappers.scala:347) at scala.collection.convert.JavaCollectionWrappers$AbstractJMapWrapper.foreachEntry(JavaCollectionWrappers.scala:309) at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:325) at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:135) at kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:134) at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:117) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96) {code} > NotEnoughReplicasException for __consumer_offsets topic due to out of order > offset > ---------------------------------------------------------------------------------- > > Key: KAFKA-13639 > URL: https://issues.apache.org/jira/browse/KAFKA-13639 > Project: Kafka > Issue Type: Bug > Components: core, log > Affects Versions: 2.6.2 > Reporter: Sergey Ivanov > Priority: Major > > Hello, > We faced a strange issue with Kafka during testing failover scenarios: this > assumes forces shutdown nodes where Kafka pods are placed (Kafka is deployed > to Kubernetes), and then return these nodes. > After this Kafka pods are started normally but +some+ consumers could not > connect to it with errors: > > {code:java} > [2022-01-27T14:35:09.051][level=DEBUG][class=kafka_client:utils.go:120]: > Failed to sync group mae_processor: [15] Group Coordinator Not Available: the > broker returns this error code for group coordinator requests, offset > commits, and most group management requests if the offsets topic has not yet > been created, or if the group coordinator is not active{code} > > > It looked like there were issues with ___consumer_offsets_ topic. In logs of > brokers we found this error: > {code:java} > [2022-01-27T14:56:00,233][INFO][category=kafka.coordinator.group.GroupCoordinator] > [GroupCoordinator 1]: Group mae_processor with generation 329 is now empty > (__consumer_offsets-36) > [2022-01-27T14:56:00,233][ERROR][category=kafka.server.ReplicaManager] > [ReplicaManager broker=1] Error processing append operation on partition > __consumer_offsets-36 > org.apache.kafka.common.errors.NotEnoughReplicasException: The size of the > current ISR Set(1) is insufficient to satisfy the min.isr requirement of 2 > for partition __consumer_offsets-36 > [2022-01-27T14:56:00,233][WARN][category=kafka.coordinator.group.GroupCoordinator] > [GroupCoordinator 1]: Failed to write empty metadata for group > mae_processor: The coordinator is not available. > {code} > If we check partitions of __consumer_offsets it really has one partition with > insufficient ISR: > {code:java} > topic "__consumer_offsets" with 50 partitions: > partition 0, leader 1, replicas: 1,3,2, isrs: 1,2,3 > ... > partition 35, leader 3, replicas: 3,1,2, isrs: 1,2,3 > partition 36, leader 1, replicas: 1,3,2, isrs: 1 > partition 37, leader 2, replicas: 2,1,3, isrs: 1,2,3 > .... > partition 49, leader 2, replicas: 2,1,3, isrs: 1,2,3{code} > We wait some time but the issue didn't go, we still had one partition with > insufficient ISR. > First of all we [thought > |https://stackoverflow.com/questions/51491152/fixing-under-replicated-partitions-in-kafka/53540963#53540963]this > is issue with Kafka-ZooKeeper coordinations, so we restarted ZooKeeper > cluster and brokers 2 and 3, which didn't have ISR. +But it didn't help.+ > We also tried to manually ellect leader for this partition with > kafka-leader-election.sh (in hope it will help). +But it didn't help too.+ > In logs we also found an issue: > {code:java} > [2022-01-27T16:17:29,531][ERROR][category=kafka.server.ReplicaFetcherThread] > [ReplicaFetcher replicaId=2, leaderId=1, fetcherId=0] Unexpected error > occurred while processing data for partition __consumer_offsets-36 at offset > 19536 > kafka.common.OffsetsOutOfOrderException: Out of order offsets found in append > to __consumer_offsets-36: List(19536, 19536, 19537, 19538, 19539, 19540, > 19541, 19542, 19543, 19544, 19545, 19546, 19547, 19548, 19549, 19550, 19551, > 19552, 19553, 19554, 19555, 19556, 19557, 19558, 19559, 19560, 19561) > at kafka.log.Log.$anonfun$append$2(Log.scala:1126) > at kafka.log.Log.append(Log.scala:2349) > at kafka.log.Log.appendAsFollower(Log.scala:1036) > at > [2022-01-27T16:17:29,531][WARN][category=kafka.server.ReplicaFetcherThread] > [ReplicaFetcher replicaId=2, leaderId=1, fetcherId=0] Partition > __consumer_offsets-36 marked as failed > {code} > This looks like root cause, right? Can force shutdown Kafka process lead to > this issue? > Looks like a bug, moreover, shall Kafka handle case of corrupting data (if > it's the root cause of issue above)? -- This message was sent by Atlassian Jira (v8.20.10#820010)