[ 
https://issues.apache.org/jira/browse/KAFKA-13177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaushik srinivas reassigned KAFKA-13177:
----------------------------------------

    Assignee: kaushik srinivas

> partition failures and fewer shrink but a lot of isr expansions with 
> increased num.replica.fetchers in kafka brokers
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-13177
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13177
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: kaushik srinivas
>            Assignee: kaushik srinivas
>            Priority: Major
>
> Installing 3 node kafka broker cluster (4 core cpu and 4Gi memory on k8s)
> topics : 15, partitions each : 15 replication factor 3, min.insync.replicas  
> : 2
> producers running with acks : all
> Initially the num.replica.fetchers was set to 1 (default) and we observed 
> very frequent ISR shrinks and expansions. So the setups were tuned with a 
> higher value of 4. 
> Once after this change was done, we see below behavior and warning msgs in 
> broker logs
>  # Over a period of 2 days, there are around 10 shrinks corresponding to 10 
> partitions, but around 700 ISR expansions corresponding to almost all 
> partitions in the cluster(approx 50 to 60 partitions).
>  # we see frequent warn msg of partitions being marked as failure in the same 
> time span. Below is the trace --> {"type":"log", "host":"wwwwww", 
> "level":"WARN", "neid":"kafka-wwwwww", "system":"kafka", 
> "time":"2021-08-03T20:09:15.340", "timezone":"UTC", 
> "log":{"message":"ReplicaFetcherThread-2-1003 - 
> kafka.server.ReplicaFetcherThread - *[ReplicaFetcher replicaId=1001, 
> leaderId=1003, fetcherId=2] Partition test-16 marked as failed"}}*
>  
> We see the above behavior continuously after increasing the 
> num.replica.fetchers to 4 from 1. We did increase this to improve the 
> replication performance and hence reduce the ISR shrinks.
> But we see this strange behavior after the change. What would the above trace 
> indicate and is marking partitions as failed just a WARN msgs and handled by 
> kafka or is it something to worry about ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to