Hello Sergei and Yuchen!

I was looking at the code Yuchen wrote this week [1]. Thinking back
about practical scenarios on how the redirect would be used to
gracefully decommission server A and have all existing and new
connections user server B instead (as laid out in this thread at end
of July [2]) I fail to see how it would work.

Consider this scenario with session track variable redirect_url:
1. Server A is getting 1 new connection per second, has 100 existing
active connections, of which 10 are actually doing a query and 2 of
them are long-running transactions.
2. Admin prepares server B to replicate server A data, and once
replica lag is low runs on server A: SET GLOBAL
redirect_url=mariadb://server-b.example.com
3. Server B gets 100 new connections immediately from clients that
follow the session tracked variable. Server A continues to
authenticate and accept all new connections, which leads to server B
getting one *new* per second but does not yet serve traffic, clients
re-try connections/queries.
4. Server A goes into READ_ONLY mode and any client that did not
follow the redirect will get errors on INSERT/UPDATE/DELETE
5. All clients are reaching server B, which is catching up on
replication, clients keep re-connecting in attempt to get queries
fulfilled
6. Server B has caught up, gets promoted to primary, and starts
accepting connections
7. Server B serves at this point all traffic, and clients/applications
continue to work without errors as long as the total switchover delay
was shorter than the configured re-try delay of clients
8. If at this point there are any clients that don't support the
session tracked redirect_url, or even it they support it, but do not
obey it, they will be server stale data from server A
9. Server A shuts down, servers no redirects anymore nor stale data
9. If any of the clients had an old version of the MariaDB connector,
they will stop working and error logs will simply say that connection
to server A failed

I find it a bit concerning that in this model old clients, instead of
getting an error with a human readable explanation of what happened,
continue to read stale data. And only way to prevent that is to shut
down server A quickly, whereafter the old clients only get an error
about server not being reachable and nothing about the old client not
supporting redirects.

Hence, I still feel that Daniel's submission [3,4] would be a better
design, being more robust and safer. An error+redirect is easier to
reason about regarding integrity of data and error modes than the
voluntary maybe-redirect that is now in the works by Yuchen.

[1] https://github.com/MariaDB/server/commit/04b99a30a3f.patch).
[2] 
https://lists.mariadb.org/hyperkitty/list/developers@lists.mariadb.org/thread/7UGBTX2B2KPH7UAXCCWFMUNIRQEA3WLS/#RIH7YJZIXP72L4DDFLFY37HJS2BJTZTM
[3] https://github.com/MariaDB/server/pull/2681
[4] https://github.com/mariadb-corporation/mariadb-connector-c/pull/22
_______________________________________________
developers mailing list -- developers@lists.mariadb.org
To unsubscribe send an email to developers-le...@lists.mariadb.org

Reply via email to