[
https://issues.apache.org/jira/browse/KAFKA-16951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856057#comment-17856057
]
Graham Campbell commented on KAFKA-16951:
-----------------------------------------
Yes, if the original coordinator is online the transactional request will
either succeed as normal if leader election has happened for the relevant
__transaction_state partition or quickly return a NOT_COORDINATOR error.
I've made an attempt to generalize the handleServerDisconnect method used by
the MetadataUpdater to be a more general interface in the linked PR
Related to this ticket I also opened KAFKA-16902 to use the
socket.connection.setup.timeout.ms config to reduce the impact of attempting
reconnection.
> TransactionManager should rediscover coordinator on disconnection
> -----------------------------------------------------------------
>
> Key: KAFKA-16951
> URL: https://issues.apache.org/jira/browse/KAFKA-16951
> Project: Kafka
> Issue Type: Improvement
> Components: clients, producer
> Affects Versions: 3.7.0
> Reporter: Graham Campbell
> Priority: Major
>
> When a transaction coordinator for a transactional client shuts down for
> restart or due to failure, the NetworkClient notices the broker disconnection
> and [will automatically refresh cluster
> metadata|https://github.com/apache/kafka/blob/f380cd1b64134cf81e5dab16d71a276781de890e/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L1182-L1183]
> to get the latest partition assignments.
> The TransactionManager does not notice any changes until the next
> transactional request. If the broker is still offline, this is a [blocking
> wait while the client attempts to reconnect to the old
> coordinator|https://github.com/apache/kafka/blob/f380cd1b64134cf81e5dab16d71a276781de890e/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L489-L490],
> which can be up to request.timeout.ms long (default 35 seconds). Coordinator
> lookup is only performed after a transactional request times out and fails.
> The lookup is triggered in either the [Sender|#L525-L528]
> or
> [TransactionalManager's|https://github.com/apache/kafka/blob/f380cd1b64134cf81e5dab16d71a276781de890e/clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java#L1225-L1229]
> error handling.
> To support faster recovery and faster reaction to transaction coordinator
> reassignments, the TransactionManager should proactively lookup the
> transaction coordinator whenever the client is disconnected from the current
> transaction coordinator.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)