Luke Chen created KAFKA-15414:
---------------------------------
Summary: remote logs get deleted after partition reassignment
Key: KAFKA-15414
URL: https://issues.apache.org/jira/browse/KAFKA-15414
Project: Kafka
Issue Type: Bug
Reporter: Luke Chen
Attachments: image-2023-08-29-11-12-58-875.png
it seems I'm reaching that codepath when running reassignments on my cluster
and segment are deleted from remote store despite a huge retention (topic
created a few hours ago with 1000h retention).
It seems to happen consistently on some partitions when reassigning but not all
partitions.
My test:
I have a test topic with 30 partition configured with 1000h global retention
and 2 minutes local retention
I have a load tester producing to all partitions evenly
I have consumer load tester consuming that topic
I regularly reset offsets to earliest on my consumer to test backfilling from
tiered storage.
My consumer was catching up consuming the backlog and I wanted to upscale my
cluster to speed up recovery: I upscaled my cluster from 3 to 12 brokers and
reassigned my test topic to all available brokers to have an even
leader/follower count per broker.
When I triggered the reassignment, the consumer lag dropped on some of my topic
partitions:
!image-2023-08-29-11-12-58-875.png|width=800,height=79! Screenshot 2023-08-28
at 20 57 09
Later I tried to reassign back my topic to 3 brokers and the issue happened
again.
Both times in my logs, I've seen a bunch of logs like:
[RemoteLogManager=10005 partition=uR3O_hk3QRqsn4mPXGFoOw:loadtest11-17] Deleted
remote log segment RemoteLogSegmentId
{topicIdPartition=uR3O_hk3QRqsn4mPXGFoOw:loadtest11-17,
id=Mk0chBQrTyKETTawIulQog}
due to leader epoch cache truncation. Current earliest epoch:
EpochEntry(epoch=14, startOffset=46776780), segmentEndOffset: 46437796 and
segmentEpochs: [10]
Looking at my s3 bucket. The segments prior to my reassignment have been indeed
deleted.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)