[ 
https://issues.apache.org/jira/browse/KAFKA-9839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17086227#comment-17086227
 ] 

ASF GitHub Bot commented on KAFKA-9839:
---------------------------------------

apovzner commented on pull request #8509: KAFKA-9839: Broker should accept 
control requests with newer broker epoch
URL: https://github.com/apache/kafka/pull/8509
 
 
   A broker throws IllegalStateException if the broker epoch in the 
LeaderAndIsr/UpdateMetadataRequest/StopReplicaRequest is larger than its 
current broker epoch. However, there is no guarantee that the broker would 
receive the latest broker epoch before the controller: When the broker 
registers with ZK, there are few more instructions to process before this 
broker "knows" about its epoch, while the controller may already get notified 
and send UPDATE_METADATA request (as an example) with the new epoch. This will 
result in clients getting stale metadata from this broker. 
   
   With this PR, a broker accepts 
LeaderAndIsr/UpdateMetadataRequest/StopReplicaRequest if the broker epoch is 
newer than the current epoch.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> IllegalStateException on metadata update when broker learns about its new 
> epoch after the controller
> ----------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-9839
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9839
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller, core
>    Affects Versions: 2.3.1
>            Reporter: Anna Povzner
>            Assignee: Anna Povzner
>            Priority: Critical
>
> Broker throws "java.lang.IllegalStateException: Epoch XXX larger than current 
> broker epoch YYY"  on UPDATE_METADATA when the controller learns about the 
> broker epoch and sends UPDATE_METADATA before KafkaZkCLient.registerBroker 
> completes (the broker learns about its new epoch).
> Here is the scenario we observed in more detail:
> 1. ZK session expires on broker 1
> 2. Broker 1 establishes new session to ZK and creates znode
> 3. Controller learns about broker 1 and assigns epoch
> 4. Broker 1 receives UPDATE_METADATA from controller, but it does not know 
> about its new epoch yet, so we get an exception:
> ERROR [KafkaApi-3] Error when handling request: clientId=1, correlationId=0, 
> api=UPDATE_METADATA, body={
> .........
> java.lang.IllegalStateException: Epoch XXX larger than current broker epoch 
> YYY at kafka.server.KafkaApis.isBrokerEpochStale(KafkaApis.scala:2725) at 
> kafka.server.KafkaApis.handleUpdateMetadataRequest(KafkaApis.scala:320) at 
> kafka.server.KafkaApis.handle(KafkaApis.scala:139) at 
> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69) at 
> java.lang.Thread.run(Thread.java:748)
> 5. KafkaZkCLient.registerBroker completes on broker 1: "INFO Stat of the 
> created znode at /brokers/ids/1"
> The result is the broker has a stale metadata for some time.
> Possible solutions:
> 1. Broker returns a more specific error and controller retries UPDATE_MEDATA
> 2. Broker accepts UPDATE_METADATA with larger broker epoch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to