Hangleton commented on code in PR #14483: URL: https://github.com/apache/kafka/pull/14483#discussion_r1380576649
########## storage/src/main/java/org/apache/kafka/storage/internals/log/RemoteIndexCache.java: ########## @@ -196,12 +197,27 @@ public void remove(Uuid key) { public void removeAll(Collection<Uuid> keys) { lock.readLock().lock(); try { - internalCache.invalidateAll(keys); + keys.forEach(key -> internalCache.asMap().computeIfPresent(key, (k, v) -> { + enqueueEntryForCleanup(v, k); + // Returning null to remove the key from the cache + return null; + })); } finally { lock.readLock().unlock(); } } + private void enqueueEntryForCleanup(Entry entry, Uuid key) { + try { + entry.markForCleanup(); + if (!expiredIndexes.offer(entry)) { + log.error("Error while inserting entry {} for key {} into the cleaner queue because queue is full.", entry, key); + } + } catch (IOException e) { Review Comment: The cache stores its files [in the first log directory](https://github.com/apache/kafka/blob/4d8efa94cbe6fd74d23dae2608058db52521bb2c/core/src/main/scala/kafka/server/KafkaServer.scala#L619) defined in server properties. Because I/O exceptions captured here originate from a log directory, why shouldn't the pattern which applies to log directory failures in Kafka not be valid here? Regarding the I/O exceptions surfaced from `copyLogSegment` - are we talking about those which occurs from within a plugin? If that is the case, then they are of a different nature since not applying to log directories, and by design the plugin implementation needs to handle those while the remote log manager provides a retry mechanism via the periodic scheduling of the RLM tasks. The idea is that unlike file system I/O errors on a log directory which are directly putting local data integrity at risk, transient I/O errors are common for clients transferring data to or from external services e.g. a public cloud storage for which availability recovers on its own and given that durability is not compromised as long as consistency of metadata updates is guaranteed. So the semantic of the I/O exception, the corresponding failure modes, and their implications, are not the same. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org