[ https://issues.apache.org/jira/browse/KAFKA-13384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexander updated KAFKA-13384: ------------------------------ Description: We found a misbehavior on our Kafka cluster (version: 2.6.2 (Commit:da65af02e5856e34)), `FailedPartitionsCount` metric is not updated if a partition log file was corrupted Steps to reproduce the problem: # corrupt a partition log file # restart Kafka process After that, you will get a correct log which tells that Kafka marked corrupted partitions as failed {code:java} 2021-10-19T14:49:31+02:00 [2021-10-19 12:49:30,924] WARN [ReplicaFetcher replicaId=11, leaderId=10, fetcherId=0] Partition test_topic-1 marked as failed (kafka.server.ReplicaFetcherThread){code} But the value of `FailedPartitionsCount` metric will be 0 (see attached screenshot) was: We found a misbehavior on our Kafka cluster (version: 2.6.2 (Commit:da65af02e5856e34)), `FailedPartitionsCount` metric is not updated if a partition log file was corrupted Steps to reproduce the problem: # corrupt a partition log file # restart Kafka process After that, you will get a correct log which tells that Kafka marked corrupted partitions as failed {code:java} 2021-10-19T14:49:31+02:00 [2021-10-19 12:49:30,924] WARN [ReplicaFetcher replicaId=11, leaderId=10, fetcherId=0] Partition test_topic-1 marked as failed (kafka.server.ReplicaFetcherThread){code} But the value of `FailedPartitionsCount` metric will be 0 (see attached screenshot) > FailedPartitionsCount metric is not updated if a partition log file was > corrupted > --------------------------------------------------------------------------------- > > Key: KAFKA-13384 > URL: https://issues.apache.org/jira/browse/KAFKA-13384 > Project: Kafka > Issue Type: Bug > Affects Versions: 2.6.2 > Environment: OS: > NAME="Amazon Linux AMI" > VERSION="2018.03" > ID="amzn" > ID_LIKE="rhel fedora" > VERSION_ID="2018.03" > PRETTY_NAME="Amazon Linux AMI 2018.03" > CPE_NAME="cpe:/o:amazon:linux:2018.03:ga" > HOME_URL="http://aws.amazon.com/amazon-linux-ami/" > Kafka version: > 2.6.2 (Commit:da65af02e5856e34) > Reporter: Alexander > Priority: Major > Attachments: Screenshot 2021-10-19 at 15.28.33.png > > > We found a misbehavior on our Kafka cluster (version: 2.6.2 > (Commit:da65af02e5856e34)), `FailedPartitionsCount` metric is not updated if > a partition log file was corrupted > Steps to reproduce the problem: > # corrupt a partition log file > # restart Kafka process > After that, you will get a correct log which tells that Kafka marked > corrupted partitions as failed > > {code:java} > 2021-10-19T14:49:31+02:00 [2021-10-19 12:49:30,924] WARN [ReplicaFetcher > replicaId=11, leaderId=10, fetcherId=0] Partition test_topic-1 marked as > failed > (kafka.server.ReplicaFetcherThread){code} > > But the value of `FailedPartitionsCount` metric will be 0 (see attached > screenshot) -- This message was sent by Atlassian Jira (v8.3.4#803005)