ocadaruma commented on code in PR #15993:
URL: https://github.com/apache/kafka/pull/15993#discussion_r1678881429
##########
storage/src/main/java/org/apache/kafka/storage/internals/epoch/LeaderEpochFileCache.java:
##########
@@ -334,35 +361,55 @@ public void truncateFromEnd(long endOffset) {
* be offset, then clears any previous epoch entries.
* <p>
* This method is exclusive: so truncateFromStart(6) will retain an entry
at offset 6.
+ * <p>
+ * Checkpoint-flushing is done asynchronously.
*
* @param startOffset the offset to clear up to
*/
- public void truncateFromStart(long startOffset) {
+ public void truncateFromStartAsyncFlush(long startOffset) {
lock.writeLock().lock();
try {
- List<EpochEntry> removedEntries = removeFromStart(entry ->
entry.startOffset <= startOffset);
-
+ List<EpochEntry> removedEntries = truncateFromStart(epochs,
startOffset);
if (!removedEntries.isEmpty()) {
- EpochEntry firstBeforeStartOffset =
removedEntries.get(removedEntries.size() - 1);
- EpochEntry updatedFirstEntry = new
EpochEntry(firstBeforeStartOffset.epoch, startOffset);
- epochs.put(updatedFirstEntry.epoch, updatedFirstEntry);
-
- // We intentionally don't force flushing change to the device
here because:
+ // We flush the change to the device in the background because:
// - To avoid fsync latency
// * fsync latency could be huge on a disk glitch, which is
not rare in spinning drives
// * This method is called as part of deleteRecords with
holding UnifiedLog#lock.
// - Meanwhile all produces against the partition will be
blocked, which causes req-handlers to exhaust
- // - Even when stale epochs remained in LeaderEpoch file due
to the unclean shutdown, it will be recovered by
- // another truncateFromStart call on log loading procedure
so it won't be a problem
- writeToFile(false);
+ // - We still flush the change in #assign synchronously,
meaning that it's guaranteed that the checkpoint file always has no missing
entries.
+ // * Even when stale epochs are restored from the checkpoint
file after the unclean shutdown, it will be handled by
+ // another truncateFromStart call on log loading
procedure, so it won't be a problem
+ scheduler.scheduleOnce("leader-epoch-cache-flush-" +
topicPartition, this::writeToFileForTruncation);
Review Comment:
Oh, thanks for pointing out. Let me check
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]