dpol1 opened a new issue, #3036:
URL: https://github.com/apache/hugegraph/issues/3036
- Follow-up to #3011.
That PR made `CachedSchemaTransactionV2` listen for `schema-cache-clear` meta
events and drop the matching local V2 caches, so a schema change on one
server
no longer leaves other nodes serving stale schema in HStore / multi-server
mode.
The catch is the listener registers exactly once per JVM. The gRPC watch
behind
it is process-wide and there is no `unlisten`. If the Meta transport
reconnects
and the old watch gets dropped, the listener goes deaf and nothing brings it
back. There is already a manual recovery hook,
`resetMetaListenerForReconnect()`, but nothing calls it, because Meta
exposes no
reconnect signal to hang it on.
The two halves of the gap, both in `CachedSchemaTransactionV2`:
- The listener-lifetime comment explaining why recovery is not automatic
today:
https://github.com/apache/hugegraph/blob/master/hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/cache/CachedSchemaTransactionV2.java#L52-L58
- The `resetMetaListenerForReconnect()` hook that has no caller and the
javadoc
saying it must be wired to a Meta reconnect callback:
https://github.com/apache/hugegraph/blob/master/hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/cache/CachedSchemaTransactionV2.java#L254-L270
### What should happen
After the Meta transport reconnects, the JVM-wide `schema-cache-clear`
listener
comes back on its own and keeps delivering events for every graph. No
operator
action.
### What happens now
`MetaManager` / `MetaDriver` expose `listen` and `keepAlive` but no reconnect
callback. A dropped watch goes unnoticed, events stop arriving, and the node
can
keep serving stale schema until someone calls
`resetMetaListenerForReconnect()`
by hand. Nothing does.
### Proposed fix
Two parts, same theme, easiest to do in one go:
1. Give `MetaManager` / `MetaDriver` a reconnect callback (something like
`listenReconnect` / `onTransportReconnect`) that fires when the transport
reconnects and the previous watch is gone.
2. Point `CachedSchemaTransactionV2` at that callback so it re-registers the
listener. `resetMetaListenerForReconnect()` stops being a manual entry
point
and becomes the callback target.
### How to verify
Two servers against HStore/Meta. Force a Meta reconnect (restart Meta or kill
the connection), then change a schema on server A and assert server B clears
its
V2 caches and stops returning stale schema.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]