[ https://issues.apache.org/jira/browse/KAFKA-7845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jennifer Thompson updated KAFKA-7845: ------------------------------------- Summary: Kafka clients do not re-resolve ips when a broker is replaced. (was: NotLeaderForPartitionException error when publishing after a broker has died) > Kafka clients do not re-resolve ips when a broker is replaced. > -------------------------------------------------------------- > > Key: KAFKA-7845 > URL: https://issues.apache.org/jira/browse/KAFKA-7845 > Project: Kafka > Issue Type: Bug > Components: clients > Affects Versions: 2.1.0 > Reporter: Jennifer Thompson > Priority: Major > > When one of our Kafka brokers dies and a new one replaces it (via an aws > ASG), the clients that publish to Kafka still try to publish to the old > brokers. > We see errors likeĀ > {code:java} > 2019-01-18 20:16:16 WARN NetworkClient:721 - [Producer clientId=producer-1] > Connection to node 2 (/10.130.98.111:9092) could not be established. Broker > may not be available. > 2019-01-18 20:19:09 WARN Sender:596 - [Producer clientId=producer-1] Got > error produce response with correlation id 3414 on topic-partition aa.pga-2, > retrying (4 attempts left). Error: NOT_LEADER_FOR_PARTITION > 2019-01-18 20:19:09 WARN Sender:641 - [Producer clientId=producer-1] Received > invalid metadata error in produce request on partition aa.pga-2 due to > org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is > not the leader for that topic-partition.. Going to request metadata update now > 2019-01-18 20:21:19 WARN NetworkClient:721 - [Producer clientId=producer-1] > Connection to node 2 (/10.130.98.111:9092) could not be established. Broker > may not be available. > 2019-01-18 20:21:50 ERROR ProducerBatch:233 - Error executing user-provided > callback on message for topic-partition 'aa.test-liz-0'{code} > and > {code:java} > [2019-01-18 20:28:47,732] ERROR WorkerSourceTask{id=rabbit-vpc-2-kafka-1} > Failed to flush, timed out while waiting for producer to flush outstanding 27 > messages (org.apache.kafka.connect.runtime.WorkerSourceTask) > [2019-01-18 20:28:47,732] ERROR WorkerSourceTask{id=rabbit-vpc-2-kafka-1} > Failed to commit offsets > (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter) > {code} > The ip address referenced is for the broker that died. We have Kafka Manager > running as well, and that picks up the new broker. > This started happening after we upgraded to 2.1. When had Kafka 1.1, brokers > could failover without a problem. > One thing that might be considered unusual about our deployment is that we > reuse the same broker id and EBS volume for the new broker, so that > partitions do not have to be reassigned. -- This message was sent by Atlassian JIRA (v7.6.3#76005)