[
https://issues.apache.org/jira/browse/KAFKA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774631#comment-17774631
]
shilin Lu commented on KAFKA-13639:
-----------------------------------
is there any progress for this issue? [~ijuma] [~zhaohaidao] [~mrMigles]
> NotEnoughReplicasException for __consumer_offsets topic due to out of order
> offset
> ----------------------------------------------------------------------------------
>
> Key: KAFKA-13639
> URL: https://issues.apache.org/jira/browse/KAFKA-13639
> Project: Kafka
> Issue Type: Bug
> Components: core, log
> Affects Versions: 2.6.2
> Reporter: Sergey Ivanov
> Priority: Major
>
> Hello,
> We faced a strange issue with Kafka during testing failover scenarios: this
> assumes forces shutdown nodes where Kafka pods are placed (Kafka is deployed
> to Kubernetes), and then return these nodes.
> After this Kafka pods are started normally but +some+ consumers could not
> connect to it with errors:
>
> {code:java}
> [2022-01-27T14:35:09.051][level=DEBUG][class=kafka_client:utils.go:120]:
> Failed to sync group mae_processor: [15] Group Coordinator Not Available: the
> broker returns this error code for group coordinator requests, offset
> commits, and most group management requests if the offsets topic has not yet
> been created, or if the group coordinator is not active{code}
>
>
> It looked like there were issues with ___consumer_offsets_ topic. In logs of
> brokers we found this error:
> {code:java}
> [2022-01-27T14:56:00,233][INFO][category=kafka.coordinator.group.GroupCoordinator]
> [GroupCoordinator 1]: Group mae_processor with generation 329 is now empty
> (__consumer_offsets-36)
> [2022-01-27T14:56:00,233][ERROR][category=kafka.server.ReplicaManager]
> [ReplicaManager broker=1] Error processing append operation on partition
> __consumer_offsets-36
> org.apache.kafka.common.errors.NotEnoughReplicasException: The size of the
> current ISR Set(1) is insufficient to satisfy the min.isr requirement of 2
> for partition __consumer_offsets-36
> [2022-01-27T14:56:00,233][WARN][category=kafka.coordinator.group.GroupCoordinator]
> [GroupCoordinator 1]: Failed to write empty metadata for group
> mae_processor: The coordinator is not available.
> {code}
> If we check partitions of __consumer_offsets it really has one partition with
> insufficient ISR:
> {code:java}
> topic "__consumer_offsets" with 50 partitions:
> partition 0, leader 1, replicas: 1,3,2, isrs: 1,2,3
> ...
> partition 35, leader 3, replicas: 3,1,2, isrs: 1,2,3
> partition 36, leader 1, replicas: 1,3,2, isrs: 1
> partition 37, leader 2, replicas: 2,1,3, isrs: 1,2,3
> ....
> partition 49, leader 2, replicas: 2,1,3, isrs: 1,2,3{code}
> We wait some time but the issue didn't go, we still had one partition with
> insufficient ISR.
> First of all we [thought
> |https://stackoverflow.com/questions/51491152/fixing-under-replicated-partitions-in-kafka/53540963#53540963]this
> is issue with Kafka-ZooKeeper coordinations, so we restarted ZooKeeper
> cluster and brokers 2 and 3, which didn't have ISR. +But it didn't help.+
> We also tried to manually ellect leader for this partition with
> kafka-leader-election.sh (in hope it will help). +But it didn't help too.+
> In logs we also found an issue:
> {code:java}
> [2022-01-27T16:17:29,531][ERROR][category=kafka.server.ReplicaFetcherThread]
> [ReplicaFetcher replicaId=2, leaderId=1, fetcherId=0] Unexpected error
> occurred while processing data for partition __consumer_offsets-36 at offset
> 19536
> kafka.common.OffsetsOutOfOrderException: Out of order offsets found in append
> to __consumer_offsets-36: List(19536, 19536, 19537, 19538, 19539, 19540,
> 19541, 19542, 19543, 19544, 19545, 19546, 19547, 19548, 19549, 19550, 19551,
> 19552, 19553, 19554, 19555, 19556, 19557, 19558, 19559, 19560, 19561)
> at kafka.log.Log.$anonfun$append$2(Log.scala:1126)
> at kafka.log.Log.append(Log.scala:2349)
> at kafka.log.Log.appendAsFollower(Log.scala:1036)
> at
> [2022-01-27T16:17:29,531][WARN][category=kafka.server.ReplicaFetcherThread]
> [ReplicaFetcher replicaId=2, leaderId=1, fetcherId=0] Partition
> __consumer_offsets-36 marked as failed
> {code}
> This looks like root cause, right? Can force shutdown Kafka process lead to
> this issue?
> Looks like a bug, moreover, shall Kafka handle case of corrupting data (if
> it's the root cause of issue above)?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)