dpol1 opened a new issue, #3036:
URL: https://github.com/apache/hugegraph/issues/3036

   - Follow-up to #3011.
   
   That PR made `CachedSchemaTransactionV2` listen for `schema-cache-clear` meta
   events and drop the matching local V2 caches, so a schema change on one 
server
   no longer leaves other nodes serving stale schema in HStore / multi-server 
mode.
   
   The catch is the listener registers exactly once per JVM. The gRPC watch 
behind
   it is process-wide and there is no `unlisten`. If the Meta transport 
reconnects
   and the old watch gets dropped, the listener goes deaf and nothing brings it
   back. There is already a manual recovery hook,
   `resetMetaListenerForReconnect()`, but nothing calls it, because Meta 
exposes no
   reconnect signal to hang it on.
   
   The two halves of the gap, both in `CachedSchemaTransactionV2`:
   
   - The listener-lifetime comment explaining why recovery is not automatic 
today:
     
https://github.com/apache/hugegraph/blob/master/hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/cache/CachedSchemaTransactionV2.java#L52-L58
   - The `resetMetaListenerForReconnect()` hook that has no caller and the 
javadoc
     saying it must be wired to a Meta reconnect callback:
     
https://github.com/apache/hugegraph/blob/master/hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/cache/CachedSchemaTransactionV2.java#L254-L270
   
   ### What should happen
   
   After the Meta transport reconnects, the JVM-wide `schema-cache-clear` 
listener
   comes back on its own and keeps delivering events for every graph. No 
operator
   action.
   
   ### What happens now
   
   `MetaManager` / `MetaDriver` expose `listen` and `keepAlive` but no reconnect
   callback. A dropped watch goes unnoticed, events stop arriving, and the node 
can
   keep serving stale schema until someone calls 
`resetMetaListenerForReconnect()`
   by hand. Nothing does.
   
   ### Proposed fix
   
   Two parts, same theme, easiest to do in one go:
   
   1. Give `MetaManager` / `MetaDriver` a reconnect callback (something like
      `listenReconnect` / `onTransportReconnect`) that fires when the transport
      reconnects and the previous watch is gone.
   2. Point `CachedSchemaTransactionV2` at that callback so it re-registers the
      listener. `resetMetaListenerForReconnect()` stops being a manual entry 
point
      and becomes the callback target.
   
   ### How to verify
   
   Two servers against HStore/Meta. Force a Meta reconnect (restart Meta or kill
   the connection), then change a schema on server A and assert server B clears 
its
   V2 caches and stops returning stale schema.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to