[ https://issues.apache.org/jira/browse/KAFKA-15468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Justine Olshan updated KAFKA-15468: ----------------------------------- Description: I was doing some research on txn coordinator loading and found that on a single roll, a coordinator was loaded up to 5x on a single broker! (For reference, it should only load once on the preferred leader and on any temporary leaders when the broker is down) I was looking into TopicDelta and I saw this check to show “new leaders” (prevPartition == null || prevPartition.partitionEpoch != entry.getValue().partitionEpoch) . I don’t think this is correct because epoch can change for reasons other than becoming a leader (ie isr/follower changes). Here’s some more information with respect to the scenario I encountered--the coordinator was on broker id 1: 6 Sep 2023 @ 09:42:55.782 UTC message:[Transaction State Manager 1]: Finished loading 62 transaction metadata from __transaction_state-13 in 114 milliseconds, of which 0 milliseconds was spent in the scheduler. 6 Sep 2023 @ 09:45:41.328 UTC message:[Transaction State Manager 1]: Finished loading 62 transaction metadata from __transaction_state-13 in 30 milliseconds, of which 0 milliseconds was spent in the scheduler. 6 Sep 2023 @ 09:49:42.863 UTC message:[Transaction State Manager 1]: Finished loading 62 transaction metadata from __transaction_state-13 in 990 milliseconds, of which 2 milliseconds was spent in the scheduler. (correct load) 6 Sep 2023 @ 09:51:10.868 UTC message:[Transaction State Manager 1]: Finished loading 62 transaction metadata from __transaction_state-13 in 182 milliseconds, of which 144 milliseconds was spent in the scheduler. 6 Sep 2023 @ 09:53:53.576 UTC message:[Transaction State Manager 1]: Finished loading 62 transaction metadata from __transaction_state-13 in 177 milliseconds, of which 143 milliseconds was spent in the scheduler. Following logs I found: 1. kafka-3 shuts down and is removed from ISR 2. kafka-3 restarted and rejoined the ISR 3. kafka-1 shuts down and unloads, restarts and loads (correct) 4. kafka-2 shuts down and is removed from ISR 5. kafka-2 rejoins the ISR There are two aspects to this problem 1. TopicDelta shows a change whenever partition epoch changes. Partition epoch changes can occur even when the leader doesn't change 2. A leader epoch can change without electing a new leader. In this case, we should check if the transaction coordinator has already loaded to avoid reloads. The new group coordinator has checks if the state is already loaded (indicated by running coordinator state) so we don't see the same problem there. was: I was doing some research on txn coordinator loading and found that on a single roll, a coordinator was loaded up to 5x on a single broker! (For reference, it should only load once on the preferred leader and on any temporary leaders when the broker is down) I was looking into TopicDelta and I saw this check to show “new leaders” (prevPartition == null || prevPartition.partitionEpoch != entry.getValue().partitionEpoch) . I don’t think this is correct because epoch can change for reasons other than becoming a leader (ie isr/follower changes). Here’s some more information with respect to the scenario I encountered--the coordinator was on broker id 1: 6 Sep 2023 @ 09:42:55.782 UTC message:[Transaction State Manager 1]: Finished loading 62 transaction metadata from __transaction_state-13 in 114 milliseconds, of which 0 milliseconds was spent in the scheduler. 6 Sep 2023 @ 09:45:41.328 UTC message:[Transaction State Manager 1]: Finished loading 62 transaction metadata from __transaction_state-13 in 30 milliseconds, of which 0 milliseconds was spent in the scheduler. 6 Sep 2023 @ 09:49:42.863 UTC message:[Transaction State Manager 1]: Finished loading 62 transaction metadata from __transaction_state-13 in 990 milliseconds, of which 2 milliseconds was spent in the scheduler. (correct load) 6 Sep 2023 @ 09:51:10.868 UTC message:[Transaction State Manager 1]: Finished loading 62 transaction metadata from __transaction_state-13 in 182 milliseconds, of which 144 milliseconds was spent in the scheduler. 6 Sep 2023 @ 09:53:53.576 UTC message:[Transaction State Manager 1]: Finished loading 62 transaction metadata from __transaction_state-13 in 177 milliseconds, of which 143 milliseconds was spent in the scheduler. Following logs I found: 1. kafka-3 shuts down and is removed from ISR 2. kafka-3 restarted and rejoined the ISR 3. kafka-1 shuts down and unloads, restarts and loads (correct) 4. kafka-2 shuts down and is removed from ISR 5. kafka-2 rejoins the ISR There are two aspects to this problem 1. TopicDelta shows a change whenever partition epoch changes. Partition epoch changes can occur even when the leader doesn't change 2. A leader epoch can change without electing a new leader. In this case, we should check if the transaction coordinator has already loaded to avoid reloads. > Prevent transaction coordinator reloads on already loaded leaders > ----------------------------------------------------------------- > > Key: KAFKA-15468 > URL: https://issues.apache.org/jira/browse/KAFKA-15468 > Project: Kafka > Issue Type: Task > Reporter: Justine Olshan > Assignee: Justine Olshan > Priority: Major > > I was doing some research on txn coordinator loading and found that on a > single roll, a coordinator was loaded up to 5x on a single broker! (For > reference, it should only load once on the preferred leader and on any > temporary leaders when the broker is down) > I was looking into TopicDelta and I saw this check to show “new leaders” > (prevPartition == null || prevPartition.partitionEpoch != > entry.getValue().partitionEpoch) . I don’t think this is correct because > epoch can change for reasons other than becoming a leader (ie isr/follower > changes). > Here’s some more information with respect to the scenario I encountered--the > coordinator was on broker id 1: > 6 Sep 2023 @ 09:42:55.782 UTC message:[Transaction State Manager 1]: Finished > loading 62 transaction metadata from __transaction_state-13 in 114 > milliseconds, of which 0 milliseconds was spent in the scheduler. > 6 Sep 2023 @ 09:45:41.328 UTC message:[Transaction State Manager 1]: Finished > loading 62 transaction metadata from __transaction_state-13 in 30 > milliseconds, of which 0 milliseconds was spent in the scheduler. > 6 Sep 2023 @ 09:49:42.863 UTC message:[Transaction State Manager 1]: Finished > loading 62 transaction metadata from __transaction_state-13 in 990 > milliseconds, of which 2 milliseconds was spent in the scheduler. > (correct load) > 6 Sep 2023 @ 09:51:10.868 UTC message:[Transaction State Manager 1]: Finished > loading 62 transaction metadata from __transaction_state-13 in 182 > milliseconds, of which 144 milliseconds was spent in the scheduler. > 6 Sep 2023 @ 09:53:53.576 UTC message:[Transaction State Manager 1]: Finished > loading 62 transaction metadata from __transaction_state-13 in 177 > milliseconds, of which 143 milliseconds was spent in the scheduler. > Following logs I found: > 1. kafka-3 shuts down and is removed from ISR > 2. kafka-3 restarted and rejoined the ISR > 3. kafka-1 shuts down and unloads, restarts and loads (correct) > 4. kafka-2 shuts down and is removed from ISR > 5. kafka-2 rejoins the ISR > There are two aspects to this problem > 1. TopicDelta shows a change whenever partition epoch changes. Partition > epoch changes can occur even when the leader doesn't change > 2. A leader epoch can change without electing a new leader. In this case, we > should check if the transaction coordinator has already loaded to avoid > reloads. > The new group coordinator has checks if the state is already loaded > (indicated by running coordinator state) so we don't see the same problem > there. -- This message was sent by Atlassian Jira (v8.20.10#820010)