Kyle Phelps created KAFKA-17249: ----------------------------------- Summary: Failures when building remote log aux state can make the leader epoch cache inconsistent Key: KAFKA-17249 URL: https://issues.apache.org/jira/browse/KAFKA-17249 Project: Kafka Issue Type: Bug Components: Tiered-Storage Affects Versions: 3.7.1, 3.8.0 Reporter: Kyle Phelps
When a follower has to `buildRemoteLogAuxState` it truncates the local log. Then it attempts to rebuild the epoch cache from the checkpoint in remote storage. However, if this fails and the broker is restarted, the cache is missing entries associated with remote segments. Reproduction steps: # Take an existing tiered storage partition - move the latest index file from remote storage so it will be inaccessible. # Stop one of the follower brokers, delete the partition's local data. # Restart the follower - it should be failing to build aux state. # Restart the follower again. Since the log's offsets have been updated, it can now successfully fetch and join the ISR. # Promote the follower to the leader. In this scenario the leader becomes unable to serve tiered fetch requests. I _think_ the root of the problem here is that the leader epoch cache isn't recovering the epoch data for remote segments. -- This message was sent by Atlassian Jira (v8.20.10#820010)