Kyle Phelps created KAFKA-17249:
-----------------------------------

             Summary: Failures when building remote log aux state can make the 
leader epoch cache inconsistent
                 Key: KAFKA-17249
                 URL: https://issues.apache.org/jira/browse/KAFKA-17249
             Project: Kafka
          Issue Type: Bug
          Components: Tiered-Storage
    Affects Versions: 3.7.1, 3.8.0
            Reporter: Kyle Phelps


When a follower has to `buildRemoteLogAuxState` it truncates the local log. 
Then it attempts to rebuild the epoch cache from the checkpoint in remote 
storage. However, if this fails and the broker is restarted, the cache is 
missing entries associated with remote segments.

Reproduction steps:
 # Take an existing tiered storage partition - move the latest index file from 
remote storage so it will be inaccessible.
 # Stop one of the follower brokers, delete the partition's local data.
 # Restart the follower - it should be failing to build aux state.
 # Restart the follower again. Since the log's offsets have been updated, it 
can now successfully fetch and join the ISR.
 # Promote the follower to the leader.

In this scenario the leader becomes unable to serve tiered fetch requests. 

I _think_ the root of the problem here is that the leader epoch cache isn't 
recovering the epoch data for remote segments.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to