hudeqi commented on code in PR #14652:
URL: https://github.com/apache/kafka/pull/14652#discussion_r1398774988


##########
core/src/main/java/kafka/server/ReplicaFetcherTierStateMachine.java:
##########
@@ -238,12 +238,8 @@ private Long buildRemoteLogAuxState(TopicPartition 
topicPartition,
 
                 log.debug("Updated the epoch cache from remote tier till 
offset: {} with size: {} for {}", leaderLocalLogStartOffset, epochs.size(), 
partition);
 
-                // Restore producer snapshot
-                File snapshotFile = 
LogFileUtils.producerSnapshotFile(unifiedLog.dir(), nextOffset);
-                buildProducerSnapshotFile(snapshotFile, 
remoteLogSegmentMetadata, rlm);
-
                 // Reload producer snapshots.
-                
unifiedLog.producerStateManager().truncateFullyAndReloadSnapshots();
+                truncateFullyAndReloadRestoredSnapshots(unifiedLog, 
nextOffset, remoteLogSegmentMetadata, rlm);

Review Comment:
   > @hudeqi
   > 
   > > In producerStateManager, first call truncateFullyAndStartAt to clean up 
the snapshot files, and then pull the snapshot file from RemoteLogManager. If 
the order of calls is reversed in original logic, it may cause the newly built 
snapshot file to be cleaned up again by truncateFullyAndStartAt.
   > 
   > The `truncateFullyAndReloadSnapshots` only removes the snapshot files that 
was already loaded into the ProducerStateManager, so it doesn't remove the 
`snapshotFile` that was downloaded/built from the remote storage.
   > 
   > AFAIR, we don't want to expose reloading the snapshots from 
ProducerStateManager without clearing it's internal states 
(`ProducerStateManager#reloadSnapshots` method). We can add a comment int this 
method to capture this information for future readers.
   
   Hi, @kamalcph , the reason why such changes are made in 
ReplicaFetcherTierStateMachine here is because of a relatively coincidental 
boundary case: if the name of the file pulled from the remote storage is the 
same as the name of the local file before truncate (that is, the file name 
pulled from the remote storage is in the keySet of the `snapshots` map), the 
snapshot file built first will be cleared by the later 
`truncateFullyAndStartAt`. 
   
   I was adding the current unit test and found that the test was failed, so I 
discovered this issue. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to