I've been trying to port synapse[1] to use the server state file for
seamless reloads, but I'm having some trouble. It seems the state file
is essentially ignored since any change in a backend's server ordering
invalidates the state for the entire backend (since the server puids
change, even if the server id [name] stays constant). Synapse shuffles
backends on every write of the configuration to ensure that different
client machines have different starting servers (e.g. for long lived
connections), which naturally changes puids, but even if it had a
fixed order such as sorting, whenever a server is added or removed,
the puids shift and potentially none of the state is transferred
across the reload.

I guess this comes back to the id nomenclature. From what I can tell,
the server struct defines two id like fields: (id, puid), which in
most of the server.c code are referred to as (name, id) = (id, puid).
Somewhat confusingly id is actually puid, which is not actually
unique, it's just the order of servers in a backend and I assume
exists because ids (names) might be duplicated.

If we set the server id to a proper identifier (e.g. unique
addr+port+user supplied string), then the apply server state function
ignores state whenever the server count changes because the puids
don't match. If we set the server id [name] to a constant set of
identifiers (e.g. srv1-srvN), then the apply server state function
will set things like healthcheck state to a totally unrelated server,
which also seems bad.

Either way, the server state doesn't work as I would hope. I feel like
I sorta expect the server id (name) to actually be a unique
identifier, but that gets us back to the question of how to
dynamically update it (I think you suggested adding another
identifier, the external identifier, but now that I know there is
already a puid confusingly called an id and an id called the name, I'm
a little concerned about adding a third identifier).

What do you guys think? When using show servers state is it best
practice to set your server names to be actual unique identifiers (in
which case the puid check in the apply server state should probably go
away right) or positional identifiers (in which case how do we prevent
carry over of healthcheck state between unrelated servers, maybe we
should disable state loading if we know the servers have moved
around)? Do we need a third id?

Thanks!
-Joey

[1] https://github.com/airbnb/synapse

Reply via email to