[ 
https://issues.apache.org/jira/browse/KAFKA-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696464#comment-13696464
 ] 

Jun Rao commented on KAFKA-955:
-------------------------------

It seems there are various ways that can cause this to happen. (a) In the above 
scenario, after the leaders fail over, topicX causes new sockets to be 
established. Then topicY uses the newly established socket without realizing 
that the leader for topic Y has changed. (b) When we fetch the metadata for a 
topic, we fetch the metadata for all partitions. Let's say that we never get to 
send any data to a particular partition. The socket for this partition is not 
established since  SyncProducer make socket connections lazily on first send. 
Then the leader for the partition changes. Finally, the producer sends a 
message to that partition. Now a socket is established to the wrong leader 
without the producer realizing it.

In general, if we hit any error for produce requests with ack=0, currently the 
producer won't notice it. For example, if the broker hits a 
MessageTooLargeException or if the broker hits any other unexpected exceptions. 
In those cases, forwarding the requests will not help. Also, forwarding 
requests will complicate the logic in the broker since we have to figure out 
the broker's host and port, and potentially cache the socket connection to 
other brokers.

An alternative solution is to simply close the socket connection when we hit 
any error for produce requests with ack=0. This way, the producer will realize 
the error and can choose to resend if desired.

                
> After a leader change, messages sent with ack=0 are lost
> --------------------------------------------------------
>
>                 Key: KAFKA-955
>                 URL: https://issues.apache.org/jira/browse/KAFKA-955
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jason Rosenberg
>
> If the leader changes for a partition, and a producer is sending messages 
> with ack=0, then messages will be lost, since the producer has no active way 
> of knowing that the leader has changed, until it's next metadata refresh 
> update.
> The broker receiving the message, which is no longer the leader, logs a 
> message like this:
> Produce request with correlation id 7136261 from client  on partition 
> [mytopic,0] failed due to Leader not local for partition [mytopic,0] on 
> broker 508818741
> This is exacerbated by the controlled shutdown mechanism, which forces an 
> immediate leader change.
> A possible solution to this would be for a broker which receives a message, 
> for a topic that it is no longer the leader for (and if the ack level is 0), 
> then the broker could just silently forward the message over to the current 
> leader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to