jolshan commented on PR #16760:
URL: https://github.com/apache/kafka/pull/16760#issuecomment-2263594519

   @chia7712 
   We encountered an issue when the BrokerMetadataPublisher threw a metadata 
fault as a result of a (mostly benign) IllegalStateException in 
TransactionStateManager. We (@artemlivshits) had done some previous analysis on 
this error caused potentially by the fact that we read state from the LEO 
rather than hwm. (We could file a bug for this too)
   
   > Transaction started transitioning to CompleteCommit, wrote to local log, 
waiting for it to replicate
   > Transaction coordinator did a no-op reload (KAFKA-15468), epoch didn't 
change, loaded the state up to the EOL and saw CommitComplete state (even 
though it's still waiting to replicate)
   > Operation from step 1 finished replicating, called the callback to finish 
transition into CompleteCommit
   > But the pending state is removed, so transition into CompleteCommit failed 
with IllegalStateException
   
   When we did this analysis, we assumed the error would be caught by the 
action queue. However, in the BrokerMetadataPublisher path, it is not caught 
for example when makeLeader is called. (I describe above how it would be 
handled in ZK world). 
   
   Note that while this example focused on an error in the transactions path, 
it is possible that another delayed action could throw an error and result in 
the same metadata fault, which is why we chose this approach to the fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to