[ https://issues.apache.org/jira/browse/KAFKA-9839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17086227#comment-17086227 ]
ASF GitHub Bot commented on KAFKA-9839: --------------------------------------- apovzner commented on pull request #8509: KAFKA-9839: Broker should accept control requests with newer broker epoch URL: https://github.com/apache/kafka/pull/8509 A broker throws IllegalStateException if the broker epoch in the LeaderAndIsr/UpdateMetadataRequest/StopReplicaRequest is larger than its current broker epoch. However, there is no guarantee that the broker would receive the latest broker epoch before the controller: When the broker registers with ZK, there are few more instructions to process before this broker "knows" about its epoch, while the controller may already get notified and send UPDATE_METADATA request (as an example) with the new epoch. This will result in clients getting stale metadata from this broker. With this PR, a broker accepts LeaderAndIsr/UpdateMetadataRequest/StopReplicaRequest if the broker epoch is newer than the current epoch. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > IllegalStateException on metadata update when broker learns about its new > epoch after the controller > ---------------------------------------------------------------------------------------------------- > > Key: KAFKA-9839 > URL: https://issues.apache.org/jira/browse/KAFKA-9839 > Project: Kafka > Issue Type: Bug > Components: controller, core > Affects Versions: 2.3.1 > Reporter: Anna Povzner > Assignee: Anna Povzner > Priority: Critical > > Broker throws "java.lang.IllegalStateException: Epoch XXX larger than current > broker epoch YYY" on UPDATE_METADATA when the controller learns about the > broker epoch and sends UPDATE_METADATA before KafkaZkCLient.registerBroker > completes (the broker learns about its new epoch). > Here is the scenario we observed in more detail: > 1. ZK session expires on broker 1 > 2. Broker 1 establishes new session to ZK and creates znode > 3. Controller learns about broker 1 and assigns epoch > 4. Broker 1 receives UPDATE_METADATA from controller, but it does not know > about its new epoch yet, so we get an exception: > ERROR [KafkaApi-3] Error when handling request: clientId=1, correlationId=0, > api=UPDATE_METADATA, body={ > ......... > java.lang.IllegalStateException: Epoch XXX larger than current broker epoch > YYY at kafka.server.KafkaApis.isBrokerEpochStale(KafkaApis.scala:2725) at > kafka.server.KafkaApis.handleUpdateMetadataRequest(KafkaApis.scala:320) at > kafka.server.KafkaApis.handle(KafkaApis.scala:139) at > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69) at > java.lang.Thread.run(Thread.java:748) > 5. KafkaZkCLient.registerBroker completes on broker 1: "INFO Stat of the > created znode at /brokers/ids/1" > The result is the broker has a stale metadata for some time. > Possible solutions: > 1. Broker returns a more specific error and controller retries UPDATE_MEDATA > 2. Broker accepts UPDATE_METADATA with larger broker epoch. -- This message was sent by Atlassian Jira (v8.3.4#803005)