hudeqi commented on PR #14652: URL: https://github.com/apache/kafka/pull/14652#issuecomment-1788778264
> Heya @hudeqi, thank you for the contribution! I have been reviewing this code change and I am a bit uncertain to its purpose so I wanted to ask some follow up questions. As far as I understand the current code flow roughly does the following: > > 1. Download a snapshot file from remote storage > 2. Reset data structures within ProducerStateManager and delete snapshots present in the _snapshots_ data structure via the `truncateFullyAndStartAt` > 3. Read all snapshot files on disk and repopulate the data structures inside the ProducerStateManager > > However, since downloading the snapshot from remote storage does not update the _snapshots_ data structure I do not see how the new file will be deleted as part of the call to `truncateFullyAndStartAt`. > > I also found the JIRA description a bit confusing because it kept on linking to comments people made, but none of them detailed how this could be happening. > > Could you elaborate how the call to `truncateFullyAndStartAt` will delete the newly downloaded file? Alternatively have I misunderstood what you mean to do with this pull request? Sorry for not stating it clearly in this jira. This jira was originally for adding a unit test to validate the transactional state after processing the OFFSET_MOVED_TO_TIERED_STORAGE error. When I was adding unit tests for this logic, I discovered "pulling snapshots from the remote and the file may then be cleaned" issue failed in testing, this issue has not yet been reflected in jira. As for how this issue occurs in the original logic: `snapshots` is a map, and the key is a long type offset. If the name (offset value) of the snapshot file first pulled from the remote storage and constructed happens to be in the keyset of local `snapshots`, there may be problems with being cleaned up later. @clolov I don't know if my understanding and processing are correct. @satishd please help to confirm. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org