[ 
https://issues.apache.org/jira/browse/KAFKA-7845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jennifer Thompson updated KAFKA-7845:
-------------------------------------
    Summary: Kafka clients do not re-resolve ips when a broker is replaced.  
(was: NotLeaderForPartitionException error when publishing after a broker has 
died)

> Kafka clients do not re-resolve ips when a broker is replaced.
> --------------------------------------------------------------
>
>                 Key: KAFKA-7845
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7845
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients
>    Affects Versions: 2.1.0
>            Reporter: Jennifer Thompson
>            Priority: Major
>
> When one of our Kafka brokers dies and a new one replaces it (via an aws 
> ASG), the clients that publish to Kafka still try to publish to the old 
> brokers.
> We see errors likeĀ 
> {code:java}
> 2019-01-18 20:16:16 WARN NetworkClient:721 - [Producer clientId=producer-1] 
> Connection to node 2 (/10.130.98.111:9092) could not be established. Broker 
> may not be available.
> 2019-01-18 20:19:09 WARN Sender:596 - [Producer clientId=producer-1] Got 
> error produce response with correlation id 3414 on topic-partition aa.pga-2, 
> retrying (4 attempts left). Error: NOT_LEADER_FOR_PARTITION
> 2019-01-18 20:19:09 WARN Sender:641 - [Producer clientId=producer-1] Received 
> invalid metadata error in produce request on partition aa.pga-2 due to 
> org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
> not the leader for that topic-partition.. Going to request metadata update now
> 2019-01-18 20:21:19 WARN NetworkClient:721 - [Producer clientId=producer-1] 
> Connection to node 2 (/10.130.98.111:9092) could not be established. Broker 
> may not be available.
> 2019-01-18 20:21:50 ERROR ProducerBatch:233 - Error executing user-provided 
> callback on message for topic-partition 'aa.test-liz-0'{code}
> and
> {code:java}
> [2019-01-18 20:28:47,732] ERROR WorkerSourceTask{id=rabbit-vpc-2-kafka-1} 
> Failed to flush, timed out while waiting for producer to flush outstanding 27 
> messages (org.apache.kafka.connect.runtime.WorkerSourceTask)
> [2019-01-18 20:28:47,732] ERROR WorkerSourceTask{id=rabbit-vpc-2-kafka-1} 
> Failed to commit offsets 
> (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter)
> {code}
> The ip address referenced is for the broker that died. We have Kafka Manager 
> running as well, and that picks up the new broker.
> This started happening after we upgraded to 2.1. When had Kafka 1.1, brokers 
> could failover without a problem.
> One thing that might be considered unusual about our deployment is that we 
> reuse the same broker id and EBS volume for the new broker, so that 
> partitions do not have to be reassigned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to