[ 
https://issues.apache.org/jira/browse/KAFKA-5678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16107569#comment-16107569
 ] 

Jiangjie Qin commented on KAFKA-5678:
-------------------------------------

[~Json Tu] [~cuiyang] In DelayedProduce.tryComplete(), it will complete the 
delayed produce immediately when the leader replica is not local. So there 
should be no difference between calling forceComplete() and calling 
tryComplete() in the shutdown case. When the broker shuts down, all the 
producer should immediately receive a produce response with 
NOT_LEADER_FOR_PARTITION error code for all the partitions.

One thing worth checking is that during controlled shutdown, sometimes the 
controlled shutdown request itself can take very long to complete, especially 
when there are many requests pending in the broker. So it would be good to see 
how long did the controlled shutdown request itself take. This should be 
visible in the request logger at debug level.

> When the broker graceful shutdown occurs, the producer side sends timeout.
> --------------------------------------------------------------------------
>
>                 Key: KAFKA-5678
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5678
>             Project: Kafka
>          Issue Type: Improvement
>    Affects Versions: 0.9.0.0, 0.10.0.0, 0.11.0.0
>            Reporter: tuyang
>
> Test environment as follows.
> 1.Kafka version:0.9.0.1
> 2.Cluster with 3 broker which with broker id A,B,C 
> 3.Topic with 6 partitions with 2 replicas,with 2 leader partitions at each 
> broker.
> We can reproduce the problem as follows.
> 1.we send message as quickly as possible with ack -1.
> 2.if partition p0's leader is on broker A and we graceful shutdown broker 
> A,but we send a message to p0 before the leader is reelect, so the message 
> can be appended to the leader replica successful, but if the follower replica 
> not catch it as quickly as possible, so the shutting down broker will create 
> a delayProduce for this request to wait complete until request.timeout.ms .
> 3.because of the controllerShutdown request from broker A, then the p0 
> partition leader will reelect
> , then the replica on broker A will become follower before complete shut 
> down.then the delayProduce will not be trigger to complete until expire. 
> 4.if broker A shutdown cost too long, then the producer will get response 
> after request.timeout.ms, which results in increase the producer send latency 
> when we are restarting broker one by one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to