Re: [PR] KAFKA-15481: Fix concurrency bug in RemoteIndexCache [kafka]

via GitHub Thu, 02 Nov 2023 10:59:09 -0700


Hangleton commented on code in PR #14483:
URL: https://github.com/apache/kafka/pull/14483#discussion_r1380576649



##########
storage/src/main/java/org/apache/kafka/storage/internals/log/RemoteIndexCache.java:
##########
@@ -196,12 +197,27 @@ public void remove(Uuid key) {
     public void removeAll(Collection<Uuid> keys) {
         lock.readLock().lock();
         try {
-            internalCache.invalidateAll(keys);
+            keys.forEach(key -> internalCache.asMap().computeIfPresent(key, 
(k, v) -> {
+                enqueueEntryForCleanup(v, k);
+                // Returning null to remove the key from the cache
+                return null;
+            }));
         } finally {
             lock.readLock().unlock();
         }
     }
 
+    private void enqueueEntryForCleanup(Entry entry, Uuid key) {
+        try {
+            entry.markForCleanup();
+            if (!expiredIndexes.offer(entry)) {
+                log.error("Error while inserting entry {} for key {} into the 
cleaner queue because queue is full.", entry, key);
+            }
+        } catch (IOException e) {

Review Comment:
   The cache stores its files [in the first log 
directory](https://github.com/apache/kafka/blob/4d8efa94cbe6fd74d23dae2608058db52521bb2c/core/src/main/scala/kafka/server/KafkaServer.scala#L619)
 defined in server properties. Because I/O exceptions captured here originate 
from a log directory, why shouldn't the pattern which applies to log directory 
failures in Kafka not be valid here?
   
   Regarding the I/O exceptions surfaced from `copyLogSegment` - are we talking 
about those which occurs from within a plugin? If that is the case, then they 
are of a different nature since not applying to log directories, and by design 
the plugin implementation needs to handle those while the remote log manager 
provides a retry mechanism via the periodic scheduling of the RLM tasks. The 
idea is that unlike file system I/O errors on a log directory which are 
directly putting local data integrity at risk, transient I/O errors are common 
for clients transferring data to or from external services e.g. a public cloud 
storage for which availability recovers on its own and given that durability is 
not compromised as long as consistency of metadata updates is guaranteed. So 
the semantic of the I/O exception, the corresponding failure modes, and their 
implications, are not the same.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] KAFKA-15481: Fix concurrency bug in RemoteIndexCache [kafka]

Reply via email to