jolshan commented on PR #16760: URL: https://github.com/apache/kafka/pull/16760#issuecomment-2263594519
@chia7712 We encountered an issue when the BrokerMetadataPublisher threw a metadata fault as a result of a (mostly benign) IllegalStateException in TransactionStateManager. We (@artemlivshits) had done some previous analysis on this error caused potentially by the fact that we read state from the LEO rather than hwm. (We could file a bug for this too) > Transaction started transitioning to CompleteCommit, wrote to local log, waiting for it to replicate > Transaction coordinator did a no-op reload (KAFKA-15468), epoch didn't change, loaded the state up to the EOL and saw CommitComplete state (even though it's still waiting to replicate) > Operation from step 1 finished replicating, called the callback to finish transition into CompleteCommit > But the pending state is removed, so transition into CompleteCommit failed with IllegalStateException When we did this analysis, we assumed the error would be caught by the action queue. However, in the BrokerMetadataPublisher path, it is not caught for example when makeLeader is called. (I describe above how it would be handled in ZK world). Note that while this example focused on an error in the transactions path, it is possible that another delayed action could throw an error and result in the same metadata fault, which is why we chose this approach to the fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
