mchades commented on code in PR #10740:
URL: https://github.com/apache/gravitino/pull/10740#discussion_r3196070124


##########
core/src/main/java/org/apache/gravitino/storage/relational/RelationalEntityStore.java:
##########
@@ -183,8 +184,9 @@ public <E extends Entity & HasIdentifier> List<E> batchGet(
   public boolean delete(NameIdentifier ident, Entity.EntityType entityType, 
boolean cascade)
       throws IOException {
     try {
+      boolean deleted = backend.delete(ident, entityType, cascade);
       cache.invalidate(ident, entityType);
-      return backend.delete(ident, entityType, cascade);
+      return deleted;
     } catch (NoSuchEntityException e) {

Review Comment:
   seems you don't resolve this comment?



##########
core/src/main/java/org/apache/gravitino/catalog/CatalogManager.java:
##########
@@ -806,8 +809,11 @@ public boolean dropCatalog(NameIdentifier ident, boolean 
force)
             }
 
             // Finally, delete the catalog entity as well as all its 
sub-entities from the store.
+            // Invalidate after store.delete() to prevent a background thread 
from repopulating
+            // the cache with stale data between invalidate and delete.
+            boolean deleted = store.delete(ident, EntityType.CATALOG, true);
             catalogCache.invalidate(ident);

Review Comment:
   I think the ordering change here is intentional and is the right tradeoff 
for this PR.
   
   If we invalidate the cache before `store.delete(...)`, we open a window 
where a concurrent reader can see a cache miss, read the still-existing catalog 
from the store, and populate the cache again with stale data before the delete 
completes. That is the race this patch is trying to fix.
   
   The `delete succeeds but invalidate fails` case is theoretically possible, 
but it is a different partial-failure problem. For cache-aside, the more common 
write path is to mutate the source of truth first and then invalidate the 
cache, precisely to avoid stale refills during concurrent reads. So I would not 
use that scenario as a reason to move the invalidation back before the store 
mutation.
   
   So my take is that this PR is fixing the more immediate concurrency bug, 
while the post-delete invalidate failure case would be a separate follow-up 
discussion if we want to harden it further.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to