[
https://issues.apache.org/jira/browse/KAFKA-16927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
José Armando García Sancio updated KAFKA-16927:
-----------------------------------------------
Description:
Restarting an inactive controller fails to start if the active leader has more
endpoint than the latest voter set. The easiest to reproduce is with the
following configuration
{code:java}
cat kafka.properties | grep controller.quorum
controller.quorum.voters=0@controller-0:1234,1@controller-1:1234,2@controller-2:1234{code}
This is what is executed in the QuorumState loading code:
{code:java}
if (leaderEndpoints.isEmpty()) {
...
} else {
initialState = new FollowerState(
time,
election.epoch(),
election.leaderId(),
leaderEndpoints,
voters.voterIds(),
Optional.empty(),
fetchTimeoutMs,
logContext
);
}{code}
If the leader has two endpoints it will send the following BEGIN_QUORUM_EPOCH
request:
{code:java}
"leaderEndpoints":[{"name":"CONTROLLER_PLAINTEXT","host":"controller-0","port":1234},{"name":"CONTROLLER","host":"controller-0","port":4321}]{code}
And this code doesn't handle that correctly:
{code:java}
} else if (
leaderId.isPresent() &&
(!quorum.hasLeader() || leaderEndpoints.size() >
quorum.leaderEndpoints().size())
) {
// The request or response indicates the leader of the current
epoch
// which are currently unknown or the replica has discovered more
endpoints
transitionToFollower(epoch, leaderId.getAsInt(), leaderEndpoints,
currentTimeMs);
}{code}
After adding a test for this, the test fails with the following:
{code:java}
Gradle Test Run :raft:test > Gradle Test Executor 44 > KafkaRaftClientTest >
testHandleBeginQuorumRequestMoreEndpoints() FAILED
java.lang.AssertionError: Assertion failed with an exception
at
org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:453)
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:393)
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:377)
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:367)
at
org.apache.kafka.raft.RaftClientTestContext.pollUntil(RaftClientTestContext.java:617)
at
org.apache.kafka.raft.RaftClientTestContext.pollUntilResponse(RaftClientTestContext.java:624)
at
org.apache.kafka.raft.KafkaRaftClientTest.testHandleBeginQuorumRequestMoreEndpoints(KafkaRaftClientTest.java:993)
Caused by:
java.lang.IllegalStateException: Cannot transition to Follower with
leader 829 and epoch 3 from state FollowerState(fetchTimeoutMs=50000, epoch=3,
leader=829,
leaderEndpoints=Endpoints(endpoints={ListenerName(LISTENER)=localhost/<unresolved>:10819}),
voters=[828, 829], highWatermark=Optional.empty,
fetchingSnapshot=Optional.empty)
at
org.apache.kafka.raft.QuorumState.transitionToFollower(QuorumState.java:480)
at
org.apache.kafka.raft.KafkaRaftClient.transitionToFollower(KafkaRaftClient.java:732)
at
org.apache.kafka.raft.KafkaRaftClient.maybeTransition(KafkaRaftClient.java:2434)
at
org.apache.kafka.raft.KafkaRaftClient.handleBeginQuorumEpochRequest(KafkaRaftClient.java:1018)
at
org.apache.kafka.raft.KafkaRaftClient.handleRequest(KafkaRaftClient.java:2565)
at
org.apache.kafka.raft.KafkaRaftClient.handleInboundMessage(KafkaRaftClient.java:2613)
at
org.apache.kafka.raft.KafkaRaftClient.poll(KafkaRaftClient.java:3314)
at
org.apache.kafka.raft.RaftClientTestContext.lambda$pollUntil$1(RaftClientTestContext.java:618)
at
org.apache.kafka.test.TestUtils.lambda$waitForCondition$3(TestUtils.java:396)
at
org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:444)
... 6 more
1 test completed, 1 failed {code}
was:We'll need to test that restarting followers are populated with correct
leader endpoint after receiving BeginQuorumEpochRequest. Depends on KAFKA-16536
and voter RPC changes to be done
> Adding tests for restarting followers receiving leader endpoint correctly
> -------------------------------------------------------------------------
>
> Key: KAFKA-16927
> URL: https://issues.apache.org/jira/browse/KAFKA-16927
> Project: Kafka
> Issue Type: Sub-task
> Components: kraft
> Reporter: Alyssa Huang
> Assignee: José Armando García Sancio
> Priority: Major
>
> Restarting an inactive controller fails to start if the active leader has
> more endpoint than the latest voter set. The easiest to reproduce is with the
> following configuration
> {code:java}
> cat kafka.properties | grep controller.quorum
> controller.quorum.voters=0@controller-0:1234,1@controller-1:1234,2@controller-2:1234{code}
> This is what is executed in the QuorumState loading code:
> {code:java}
> if (leaderEndpoints.isEmpty()) {
> ...
> } else {
> initialState = new FollowerState(
> time,
> election.epoch(),
> election.leaderId(),
> leaderEndpoints,
> voters.voterIds(),
> Optional.empty(),
> fetchTimeoutMs,
> logContext
> );
> }{code}
> If the leader has two endpoints it will send the following BEGIN_QUORUM_EPOCH
> request:
> {code:java}
>
> "leaderEndpoints":[{"name":"CONTROLLER_PLAINTEXT","host":"controller-0","port":1234},{"name":"CONTROLLER","host":"controller-0","port":4321}]{code}
> And this code doesn't handle that correctly:
> {code:java}
> } else if (
> leaderId.isPresent() &&
> (!quorum.hasLeader() || leaderEndpoints.size() >
> quorum.leaderEndpoints().size())
> ) {
> // The request or response indicates the leader of the current
> epoch
> // which are currently unknown or the replica has discovered
> more endpoints
> transitionToFollower(epoch, leaderId.getAsInt(),
> leaderEndpoints, currentTimeMs);
> }{code}
> After adding a test for this, the test fails with the following:
> {code:java}
> Gradle Test Run :raft:test > Gradle Test Executor 44 > KafkaRaftClientTest >
> testHandleBeginQuorumRequestMoreEndpoints() FAILED
> java.lang.AssertionError: Assertion failed with an exception
> at
> org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:453)
> at
> org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:393)
> at
> org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:377)
> at
> org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:367)
> at
> org.apache.kafka.raft.RaftClientTestContext.pollUntil(RaftClientTestContext.java:617)
> at
> org.apache.kafka.raft.RaftClientTestContext.pollUntilResponse(RaftClientTestContext.java:624)
> at
> org.apache.kafka.raft.KafkaRaftClientTest.testHandleBeginQuorumRequestMoreEndpoints(KafkaRaftClientTest.java:993)
> Caused by:
> java.lang.IllegalStateException: Cannot transition to Follower with
> leader 829 and epoch 3 from state FollowerState(fetchTimeoutMs=50000,
> epoch=3, leader=829,
> leaderEndpoints=Endpoints(endpoints={ListenerName(LISTENER)=localhost/<unresolved>:10819}),
> voters=[828, 829], highWatermark=Optional.empty,
> fetchingSnapshot=Optional.empty)
> at
> org.apache.kafka.raft.QuorumState.transitionToFollower(QuorumState.java:480)
> at
> org.apache.kafka.raft.KafkaRaftClient.transitionToFollower(KafkaRaftClient.java:732)
> at
> org.apache.kafka.raft.KafkaRaftClient.maybeTransition(KafkaRaftClient.java:2434)
> at
> org.apache.kafka.raft.KafkaRaftClient.handleBeginQuorumEpochRequest(KafkaRaftClient.java:1018)
> at
> org.apache.kafka.raft.KafkaRaftClient.handleRequest(KafkaRaftClient.java:2565)
> at
> org.apache.kafka.raft.KafkaRaftClient.handleInboundMessage(KafkaRaftClient.java:2613)
> at
> org.apache.kafka.raft.KafkaRaftClient.poll(KafkaRaftClient.java:3314)
> at
> org.apache.kafka.raft.RaftClientTestContext.lambda$pollUntil$1(RaftClientTestContext.java:618)
> at
> org.apache.kafka.test.TestUtils.lambda$waitForCondition$3(TestUtils.java:396)
> at
> org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:444)
> ... 6 more
> 1 test completed, 1 failed {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)