[jira] [Commented] (KAFKA-15414) remote logs get deleted after partition reassignment

Francois Visconte (Jira) Tue, 12 Sep 2023 04:59:06 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-15414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764143#comment-17764143
 ]


Francois Visconte commented on KAFKA-15414:
-------------------------------------------

Not sure it's the same issue happening again but I have a strange behaviour 
while trying to reassign my partitions while consuming from the past (and 
hitting tiered-storage).

It seems that at some point my consumer offset lag is going backward 
!Screenshot 2023-09-12 at 13.53.07.png|width=1355,height=191!
And I have a burst of errors like on a handful of partitions (3 partitions out 
of 32)


{code:java}
[ReplicaFetcher replicaId=10002, leaderId=10007, fetcherId=2] Error building 
remote log auxiliary state for loadtest14-21 
org.apache.kafka.server.log.remote.storage.RemoteStorageException: Couldn't 
build the state from remote store for partition: loadtest14-21, 
currentLeaderEpoch: 13, leaderLocalLogStartOffset: 81012034, 
leaderLogStartOffset: 0, epoch: 12as the previous remote log segment metadata 
was not found
    at 
kafka.server.ReplicaFetcherTierStateMachine.buildRemoteLogAuxState(ReplicaFetcherTierStateMachine.java:252)
    at 
kafka.server.ReplicaFetcherTierStateMachine.start(ReplicaFetcherTierStateMachine.java:102)
    at 
kafka.server.AbstractFetcherThread.handleOffsetsMovedToTieredStorage(AbstractFetcherThread.scala:761)
    at 
kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$7(AbstractFetcherThread.scala:412)
    at scala.Option.foreach(Option.scala:437)
    at 
kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6(AbstractFetcherThread.scala:332)
    at 
kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6$adapted(AbstractFetcherThread.scala:331)
    at 
kafka.utils.Implicits$MapExtensionMethods$.$anonfun$forKeyValue$1(Implicits.scala:62)
    at 
scala.collection.convert.JavaCollectionWrappers$JMapWrapperLike.foreachEntry(JavaCollectionWrappers.scala:407)
    at 
scala.collection.convert.JavaCollectionWrappers$JMapWrapperLike.foreachEntry$(JavaCollectionWrappers.scala:403)
    at 
scala.collection.convert.JavaCollectionWrappers$AbstractJMapWrapper.foreachEntry(JavaCollectionWrappers.scala:321)
    at 
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:331)
    at 
kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:130)
    at 
kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:129)
    at scala.Option.foreach(Option.scala:437)
    at 
kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:129)
    at 
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:112)
    at kafka.server.ReplicaFetcherThread.doWork(ReplicaFetcherThread.scala:98)
    at 
org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:130)

{code}
 

 

> remote logs get deleted after partition reassignment
> ----------------------------------------------------
>
>                 Key: KAFKA-15414
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15414
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Luke Chen
>            Assignee: Kamal Chandraprakash
>            Priority: Blocker
>             Fix For: 3.6.0
>
>         Attachments: Screenshot 2023-09-12 at 13.53.07.png, 
> image-2023-08-29-11-12-58-875.png
>
>
> it seems I'm reaching that codepath when running reassignments on my cluster 
> and segment are deleted from remote store despite a huge retention (topic 
> created a few hours ago with 1000h retention).
> It seems to happen consistently on some partitions when reassigning but not 
> all partitions.
> My test:
> I have a test topic with 30 partition configured with 1000h global retention 
> and 2 minutes local retention
> I have a load tester producing to all partitions evenly
> I have consumer load tester consuming that topic
> I regularly reset offsets to earliest on my consumer to test backfilling from 
> tiered storage.
> My consumer was catching up consuming the backlog and I wanted to upscale my 
> cluster to speed up recovery: I upscaled my cluster from 3 to 12 brokers and 
> reassigned my test topic to all available brokers to have an even 
> leader/follower count per broker.
> When I triggered the reassignment, the consumer lag dropped on some of my 
> topic partitions:
> !image-2023-08-29-11-12-58-875.png|width=800,height=79! Screenshot 2023-08-28 
> at 20 57 09
> Later I tried to reassign back my topic to 3 brokers and the issue happened 
> again.
> Both times in my logs, I've seen a bunch of logs like:
> [RemoteLogManager=10005 partition=uR3O_hk3QRqsn4mPXGFoOw:loadtest11-17] 
> Deleted remote log segment RemoteLogSegmentId
> {topicIdPartition=uR3O_hk3QRqsn4mPXGFoOw:loadtest11-17, 
> id=Mk0chBQrTyKETTawIulQog}
> due to leader epoch cache truncation. Current earliest epoch: 
> EpochEntry(epoch=14, startOffset=46776780), segmentEndOffset: 46437796 and 
> segmentEpochs: [10]
> Looking at my s3 bucket. The segments prior to my reassignment have been 
> indeed deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-15414) remote logs get deleted after partition reassignment

Reply via email to