[jira] [Commented] (KAFKA-15372) MM2 rolling restart can drop configuration changes silently

Daniel Urban (Jira) Fri, 18 Aug 2023 00:24:04 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755833#comment-17755833
 ]


Daniel Urban commented on KAFKA-15372:
--------------------------------------

[~gharris1727]  Not sure if I follow this part: "should forward configurations 
to the leader via the internal REST API."

I checked org.apache.kafka.connect.mirror.MirrorMaker#configureConnector which 
then calls 
org.apache.kafka.connect.runtime.distributed.DistributedHerder#putConnectorConfig,
 and I don't really see any sign of forwarding to the leader. The callback of 
the validation explicitly handles the non-leader state with a failure:
{code:java}
if (!isLeader()) {
    callback.onCompletion(new NotLeaderException("Only the leader can set 
connector configs.", leaderUrl()), null);
    return null;
} {code}
So I think that current trunk is also affected by this, there is no Connector 
configuration forwarding to the leader in MM2. Additionally, I'm not sure if a 
single forward attempt is enough to ensure correctness, but that is an 
implementation detail.

Unfortunately, I really don't have an exact reproduction, but I saw this 
happening in an actual cluster, the leadership changes occurred as I detailed 
in the ticket description.

> MM2 rolling restart can drop configuration changes silently
> -----------------------------------------------------------
>
>                 Key: KAFKA-15372
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15372
>             Project: Kafka
>          Issue Type: Improvement
>          Components: mirrormaker
>            Reporter: Daniel Urban
>            Priority: Major
>
> When MM2 is restarted, it tries to update the Connector configuration in all 
> flows. This is a one-time trial, and fails if the Connect worker is not the 
> leader of the group.
> In a distributed setup and with a rolling restart, it is possible that for a 
> specific flow, the Connect worker of the just restarted MM2 instance is not 
> the leader, meaning that Connector configurations can get dropped.
> For example, assuming 2 MM2 instances, and one flow A->B:
>  # MM2 instance 1 is restarted, the worker inside MM2 instance 2 becomes the 
> leader of A->B Connect group.
>  # MM2 instance 1 tries to update the Connector configurations, but fails 
> (instance 2 has the leader, not instance 1)
>  # MM2 instance 2 is restarted, leadership moves to worker in MM2 instance 1
>  # MM2 instance 2 tries to update the Connector configurations, but fails
> At this point, the configuration changes before the restart are never 
> applied. Many times, this can also happen silently, without any indication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-15372) MM2 rolling restart can drop configuration changes silently

Reply via email to