imbajin commented on code in PR #3011:
URL: https://github.com/apache/hugegraph/pull/3011#discussion_r3154362425


##########
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/cache/CachedSchemaTransactionV2.java:
##########
@@ -43,6 +45,13 @@
 
 public class CachedSchemaTransactionV2 extends SchemaTransactionV2 {
 
+    private static final String ID_CACHE_PREFIX = "schema-id";
+    private static final String NAME_CACHE_PREFIX = "schema-name";
+
+    // MetaDriver doesn't expose unlisten, register the meta listener once.
+    private static final AtomicBoolean metaEventListenerRegistered =
+            new AtomicBoolean(false);

Review Comment:
   ‼️ **CRITICAL: Static `AtomicBoolean` has no reset path on MetaManager 
reconnect — silent listener loss**
   
   If the MetaManager transport reconnects (etcd client reconnect after network 
partition), the underlying watch subscription is silently lost but 
`metaEventListenerRegistered` remains `true` forever, preventing 
re-registration. After a reconnect, this JVM stops receiving cross-node 
schema-cache-clear events with no error or warning.
   
   Also: `unlistenChanges()` clears store/EventHub listeners but deliberately 
leaves the meta listener as a ghost (by design per the comment). After a graph 
is dropped, the meta listener continues firing `clearSchemaCache()` for that 
graph name — harmless because `CacheManager` returns null for non-existent 
caches, but this gap in lifecycle ownership should be explicitly documented.
   
   Minimum fix: log a warning if MetaManager reconnects without 
re-registration, and consider adding a reconnect hook.



##########
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/cache/CachedSchemaTransactionV2.java:
##########
@@ -202,6 +268,7 @@ protected void updateSchema(SchemaElement schema,
         super.updateSchema(schema, updateCallback);
 
         this.updateCache(schema);
+        this.maybeNotifySchemaCacheClear();
     }

Review Comment:
   ‼️ **CRITICAL: `updateSchema()` fires a full cluster-wide schema cache clear 
on *every* status transition, causing notification storms**
   
   `CachedSchemaTransactionV2.updateSchema()` is invoked by 
`SchemaTransactionV2.updateSchemaStatus()`, which background jobs call multiple 
times per operation:
   
   - `IndexLabelRebuildJob`: 4 calls (REBUILDING → INVALID → CREATED + error 
rollback)
   - `EdgeLabelRemoveJob` / `VertexLabelRemoveJob` / `IndexLabelRemoveJob`: 2–3 
calls each
   
   Each call now fires `MetaManager.notifySchemaCacheClear()`, broadcasting a 
**full clear of all PK/VL/EL/IL caches** across all cluster nodes. A single 
`rebuildIndex` on a 5-node cluster generates ~20 full-cache-clear broadcasts 
during normal background execution.
   
   Note: `CachedSchemaTransaction` (V1) deliberately did NOT notify in 
`updateSchema()` — status transitions are internal bookkeeping, not topology 
changes. The fix should be scoped to `addSchema` and `removeSchema` only:
   
   ```suggestion
       @Override
       protected void updateSchema(SchemaElement schema,
                                   Consumer<SchemaElement> updateCallback) {
           super.updateSchema(schema, updateCallback);
   
           this.updateCache(schema);
           // Status transitions are internal bookkeeping; notifying here 
causes a
           // broadcast storm for every updateSchemaStatus() call from 
background jobs.
       }
   ```



##########
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/cache/CachedSchemaTransactionV2.java:
##########
@@ -145,12 +180,43 @@ private void listenChanges() {
         if (!schemaEventHub.containsListener(Events.CACHE)) {
             schemaEventHub.listen(Events.CACHE, this.cacheEventListener);
         }
+
+        listenSchemaCacheClear();
+    }
+
+    private static void listenSchemaCacheClear() {
+        if (!metaEventListenerRegistered.compareAndSet(false, true)) {
+            return;
+        }
+
+        try {
+            MetaManager.instance().listenSchemaCacheClear(response -> {
+                List<String> graphNames =
+                        MetaManager.instance().metaDriver()
+                                   .extractValuesFromResponse(response);
+                if (graphNames == null) {
+                    return;
+                }
+                for (String graphName : graphNames) {
+                    LOG.debug("Graph {} clear schema cache on meta event",
+                              graphName);
+                    clearSchemaCache(graphName);
+                }
+            });
+        } catch (RuntimeException e) {
+            metaEventListenerRegistered.set(false);

Review Comment:
   ⚠️ **Exception catch is too narrow — non-`RuntimeException` failures leave 
`AtomicBoolean` stuck at `true`**
   
   If `MetaManager.instance().listenSchemaCacheClear()` throws a checked 
exception (wrapped or not) or an `Error`, the reset at line 208 is never 
reached and the listener is permanently skipped for this JVM.
   
   ```suggestion
           } catch (Exception e) {
               metaEventListenerRegistered.set(false);
               throw e instanceof RuntimeException ? (RuntimeException) e
                                                   : new 
RuntimeException("Failed to register schema cache clear listener", e);
           }
   ```



##########
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/cache/CachedSchemaTransactionV2.java:
##########
@@ -145,12 +180,43 @@ private void listenChanges() {
         if (!schemaEventHub.containsListener(Events.CACHE)) {
             schemaEventHub.listen(Events.CACHE, this.cacheEventListener);
         }
+
+        listenSchemaCacheClear();
+    }
+
+    private static void listenSchemaCacheClear() {
+        if (!metaEventListenerRegistered.compareAndSet(false, true)) {
+            return;
+        }
+
+        try {
+            MetaManager.instance().listenSchemaCacheClear(response -> {
+                List<String> graphNames =
+                        MetaManager.instance().metaDriver()
+                                   .extractValuesFromResponse(response);

Review Comment:
   ⚠️ **Direct `metaDriver()` call breaks the MetaManager abstraction layer**
   
   ```java
   MetaManager.instance().metaDriver().extractValuesFromResponse(response)
   ```
   
   The cache package accessing `MetaDriver` internals directly couples 
`CachedSchemaTransactionV2` to the meta package's internals. `MetaManager` 
already exposes typed extraction helpers (`extractGraphsFromResponse`, 
`extractServicesFromResponse`, etc.). This PR added `metaDriver()` as a public 
accessor just for this one use case — the right fix is a dedicated helper on 
`MetaManager`:
   
   ```suggestion
                   List<String> graphNames =
                           MetaManager.instance()
                                      
.extractSchemaCacheClearGraphNamesFromResponse(response);
   ```



##########
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/cache/CachedSchemaTransactionV2.java:
##########
@@ -86,11 +95,36 @@ public void close() {
     }
 
     private Cache<Id, Object> cache(String prefix, long capacity) {
-        final String name = prefix + "-" + this.graph().spaceGraphName();
+        final String name = cacheName(prefix, this.graph().spaceGraphName());
         // NOTE: must disable schema cache-expire due to getAllSchema()
         return CacheManager.instance().cache(name, capacity);
     }
 
+    private static String cacheName(String prefix, String spaceGraphName) {
+        return prefix + "-" + spaceGraphName;
+    }
+
+    private static void clearSchemaCache(String spaceGraphName) {
+        Map<String, Cache<Id, Object>> caches = 
CacheManager.instance().caches();
+
+        Cache<Id, Object> idCache = caches.get(cacheName(ID_CACHE_PREFIX,
+                                                         spaceGraphName));
+        if (idCache != null) {
+            idCache.clear();
+

Review Comment:
   ⚠️ **`clearSchemaCache` is non-atomic: TOCTOU window between 
`idCache.clear()` and `nameCache.clear()`**
   
   Between these two `clear()` calls a reader thread can see `idCache` empty 
while `nameCache` still holds stale entries. `getSchema(type, name)` checks 
`nameCache` first — a hit from the still-live `nameCache` bypasses any reload 
logic and returns a stale object. This same window exists in the 
instance-method `clearCache()` and both should be addressed together.



##########
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/cache/CachedSchemaTransactionV2.java:
##########
@@ -145,12 +180,43 @@ private void listenChanges() {
         if (!schemaEventHub.containsListener(Events.CACHE)) {
             schemaEventHub.listen(Events.CACHE, this.cacheEventListener);
         }
+
+        listenSchemaCacheClear();
+    }
+
+    private static void listenSchemaCacheClear() {
+        if (!metaEventListenerRegistered.compareAndSet(false, true)) {
+            return;

Review Comment:
   ⚠️ **`AtomicBoolean` is set `true` *before* the listener is fully registered 
— narrow race window**
   
   The flag is flipped via `compareAndSet(false, true)` before 
`MetaManager.instance().listenSchemaCacheClear(...)` completes the underlying 
gRPC watch setup. Another thread calling `listenSchemaCacheClear()` in the same 
window sees `true` and returns immediately, but the subscription may not yet be 
live. Events published in that window are silently dropped.
   
   The `catch` rollback handles construction failures, not the concurrent setup 
race. Consider wrapping the CAS-to-registration sequence in a `synchronized` 
block on a class-level lock.



##########
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/cache/CachedSchemaTransactionV2.java:
##########
@@ -238,6 +301,10 @@ public void removeSchema(SchemaElement schema) {
 
         this.invalidateCache(schema.type(), schema.id());
 
+        this.maybeNotifySchemaCacheClear();
+    }
+
+    private void maybeNotifySchemaCacheClear() {
         if (!this.graph().option(CoreOptions.TASK_SYNC_DELETION)) {
             MetaManager.instance()

Review Comment:
   🧹 **`TASK_SYNC_DELETION` gate in `maybeNotifySchemaCacheClear` is 
semantically mismatched for `addSchema`/`updateSchema`**
   
   `CoreOptions.TASK_SYNC_DELETION` controls whether task-based schema 
*deletion* runs synchronously. Using it to suppress `addSchema` and 
`updateSchema` notifications means: when `TASK_SYNC_DELETION=true` (typical 
test/single-node config), schema additions and updates are silently never 
propagated to remote nodes — even though they have nothing to do with deletion 
tasks.
   
   As a side effect, integration tests that run with `TASK_SYNC_DELETION=true` 
never exercise the notification path added by this PR. Consider separating the 
guard: `removeSchema` only for the deletion gate, always notify for `addSchema`.



##########
hugegraph-server/hugegraph-test/src/main/java/org/apache/hugegraph/unit/cache/CachedSchemaTransactionTest.java:
##########
@@ -165,6 +172,98 @@ public void testGetSchema() throws Exception {
                             cache.getPropertyKey(IdGenerator.of(1)).name());
     }
 
+    @Test
+    public void testClearV2SchemaCacheByGraphName() {
+        String graphName = "DEFAULT-unit-test-v2";
+        String otherGraphName = "DEFAULT-other-v2";
+
+        Cache<Id, Object> idCache = CacheManager.instance()
+                                                .cache("schema-id-" +
+                                                       graphName, 10L);
+        Cache<Id, Object> nameCache = CacheManager.instance()
+                                                  .cache("schema-name-" +
+                                                         graphName, 10L);
+        Cache<Id, Object> otherIdCache = CacheManager.instance()
+                                                     .cache("schema-id-" +
+                                                            otherGraphName,
+                                                            10L);
+        Object arrayCaches = idCache.attachment(newV2SchemaCaches(10));
+        Id arrayCacheId = IdGenerator.of(1);
+        SchemaElement arrayCacheSchema =
+                new FakeObjects("unit-test-v2")
+                        .newPropertyKey(arrayCacheId, "fake-pk-array");
+
+        try {
+            clearV2SchemaCaches(arrayCaches);
+            setV2SchemaCache(arrayCaches, HugeType.PROPERTY_KEY, arrayCacheId,
+                             arrayCacheSchema);
+            idCache.update(IdGenerator.of(1), "fake-pk-by-id");
+            nameCache.update(IdGenerator.of("fake-pk"), "fake-pk-by-name");
+            otherIdCache.update(IdGenerator.of(2), "other-pk-by-id");
+
+            Assert.assertEquals(1L, idCache.size());
+            Assert.assertEquals(1L, nameCache.size());
+            Assert.assertEquals(1L, otherIdCache.size());
+            Assert.assertSame(arrayCacheSchema,
+                              getV2SchemaCache(arrayCaches,
+                                               HugeType.PROPERTY_KEY,
+                                               arrayCacheId));
+
+            Whitebox.invokeStatic(CachedSchemaTransactionV2.class,
+                                  new Class<?>[]{String.class},
+                                  "clearSchemaCache", graphName);
+

Review Comment:
   ⚠️ **Test only covers `clearSchemaCache()` in isolation via reflection — the 
end-to-end listener path is untested**
   
   The test calls the private static method directly via 
`Whitebox.invokeStatic`. It does not verify:
   - That `listenSchemaCacheClear()` actually registers a MetaManager watcher
   - That `maybeNotifySchemaCacheClear()` → MetaManager publish → listener 
callback → `clearSchemaCache()` completes end-to-end
   - The `AtomicBoolean` idempotency: two graph instances in the same JVM 
should register only one listener
   - The `TASK_SYNC_DELETION=true` suppression path (which also means CI tests 
never exercise the notification)
   
   The behavior this PR claims to fix — cross-node cache sync — has zero test 
coverage.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to