[ https://issues.apache.org/jira/browse/KAFKA-17249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Francois Visconte updated KAFKA-17249: -------------------------------------- Affects Version/s: 3.9.0 > Failures when building remote log aux state can make the leader epoch cache > inconsistent > ---------------------------------------------------------------------------------------- > > Key: KAFKA-17249 > URL: https://issues.apache.org/jira/browse/KAFKA-17249 > Project: Kafka > Issue Type: Bug > Components: Tiered-Storage > Affects Versions: 3.8.0, 3.7.1, 3.9.0 > Reporter: Kyle Phelps > Priority: Major > > When a follower has to `buildRemoteLogAuxState` it truncates the local log. > Then it attempts to rebuild the epoch cache from the checkpoint in remote > storage. However, if this fails and the broker is restarted, the cache is > missing entries associated with remote segments. > Reproduction steps: > # Take an existing tiered storage partition - move the latest index file > from remote storage so it will be inaccessible. > # Stop one of the follower brokers, delete the partition's local data. > # Restart the follower - it should be failing to build aux state. > # Restart the follower again. Since the log's offsets have been updated, it > can now successfully fetch and join the ISR. > # Promote the follower to the leader. > In this scenario the leader becomes unable to serve tiered fetch requests. > I _think_ the root of the problem here is that the leader epoch cache isn't > recovering the epoch data for remote segments. -- This message was sent by Atlassian Jira (v8.20.10#820010)