On 04/30/2016 10:50 AM, Clint Byrum wrote:
Excerpts from Roman Podoliaka's message of 2016-04-29 12:04:49 -0700:


I'm curious why you think setting wsrep_sync_wait=1 wouldn't help.

The exact example appears in the Galera documentation:

http://galeracluster.com/documentation-webpages/mysqlwsrepoptions.html#wsrep-sync-wait

The moment you say 'SET SESSION wsrep_sync_wait=1', the behavior should
prevent the list problem you see, and it should not matter that it is
a separate session, as that is the entire point of the variable:


we prefer to keep it off and just point applications at a single node using master/passive/passive in HAProxy, so that we don't have the unnecessary performance hit of waiting for all transactions to propagate; we just stick on one node at a time. We've fixed a lot of issues in our config in ensuring that HAProxy definitely keeps all clients on exactly one Galera node at a time.


"When you enable this parameter, the node triggers causality checks in
response to certain types of queries. During the check, the node blocks
new queries while the database server catches up with all updates made
in the cluster to the point where the check was begun. Once it reaches
this point, the node executes the original query."

In the active/passive case where you never use the passive node as a
read slave, one could actually set wsrep_sync_wait=1 globally. This will
cause a ton of lag while new queries happen on the new active and old
transactions are still being applied, but that's exactly what you want,
so that when you fail over, nothing proceeds until all writes from the
original active node are applied and available on the new active node.
It would help if your failover technology actually _breaks_ connections
to a presumed dead node, so writes stop happening on the old one.

If HAProxy is failing over from the master, which is no longer reachable, to another passive node, which is reachable, that means that master is partitioned and will leave the Galera primary component. It also means all current database connections are going to be bounced off, which will cause errors for those clients either in the middle of an operation, or if a pooled connection is reused before it is known that the connection has been reset. So failover is usually not an error-free situation in any case from a database client perspective and retry schemes are always going to be needed.

Additionally, the purpose of the enginefacade [1] is to allow Openstack applications to fix their often incorrectly written database access logic such that in many (most?) cases, a single logical operation is no longer unnecessarily split among multiple transactions when possible. I know that this is not always feasible in the case where multiple web requests are coordinating, however.

That leaves only the very infrequent scenario of, the master has finished sending a write set off, the passives haven't finished committing that write set, the master goes down and HAProxy fails over to one of the passives, and the application that just happens to also be connecting fresh onto that new passive node in order to perform the next operation that relies upon the previously committed data so it does not see a database error, and instead runs straight onto the node where the committed data it's expecting hasn't arrived yet. I can't make the judgment for all applications if this scenario can't be handled like any other transient error that occurs during a failover situation, however if there is such a case, then IMO the wsrep_sync_wait (formerly known as wsrep_causal_reads) may be used on a per-transaction basis for that very critical, not-retryable-even-during-failover operation. Allowing this variable to be set for the scope of a transaction and reset afterwards, and only when talking to Galera, is something we've planned to work into the enginefacade as well as an declarative transaction attribute that would be a pass-through on other systems.

[1] https://specs.openstack.org/openstack/oslo-specs/specs/kilo/make-enginefacade-a-facade.html



Also, If you thrash back and forth a bit, that could cause your app to
virtually freeze, but HAProxy and most other failover technologies allow
tuning timings so that you can stay off of a passive server long enough
to calm it down and fail more gracefully to it.

Anyway, this is why sometimes I do wonder if we'd be better off just
using MySQL with DRBD and good old pacemaker.

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to