Andrei Elkin <andrei.el...@mariadb.com> writes:

> I reviewed your branch to agree with the idea of xid conflicts
> handling and its implementation.
>
> I however could not understand the need of the refactoring part.
> In order to track the xid dependency couple functions and and
>   rpl_parallel_entry::maybe_active_xid
> a sort of a sliding window - that the 2nd commit of your branch
> introduced must be sufficient to my analysis.
> Of course it needs to hold worker indexes of active xid:s.
>
> This observation led me to create a review branch
>
>    origin/review__knielsen_xa_sched_minimal_fix
>
> [where  origin g...@github.com:MariaDB/server.git]
> which is a somewhat light elaboration over the 2nd commit of your branch.

The reason for the refactoring part is to allow to distribute the
transactions as evenly as possible over the worker threads, given the
scheduling constraints imposed by XA dependencies.

Let's say we have 5 worker threads, and transactions T1..T15. If there are
no scheduling constraints, we can schedule evenly like this:

  W1:  T1 T6  T11
  W2:  T2 T7  T12
  W3:  T3 T8  T13
  W4:  T4 T9  T14
  W5:  T5 T10 T15

Now suppose we need to schedule T6 on the same worker as T4, and T8 on the
same worker as T3. The most even way to distribute the transactions is now
this:

  W1:  T1 T7  T12
  W2:  T2 T9  T14
  W3:  T3 T8  T13
  W4:  T4 T6  T11
  W5:  T5 T10 T15

You see, we try to always have 5 consecutive transactions T_i, T_{i+1}, ...,
T+{i+4} scheduled on 5 different worker threads, except when this is not
possible due to dependency restrictions.

But this most-even scheduling requires to change the scheduling order of
threads from the original sequential 1,2,3,4,5. You see, the last
transactions T11..T15 are scheduled on worker threads in order: W4, W1, W3,
W2, W5.

The refactoring patch introduces thread_sched_fifo to hold the current
scheduling order of worker threads. Whenever we have to schedule
out-of-order due to a scheduling dependency, we put the scheduled worker at
the end of this fifo, to preserve even scheduling for the next N
transactions.

In the original scheduling code, we never scheduled workers out-of-order, so
the scheduling order was always cyclic W1, W2, W3, W4, W5, W1, ..., and a
simple incrementing counter was sufficient. But this is no longer sufficient
when out-of-order scheduling occurs. In the example, if we use a simple
counter and try to schedule on worker ((i+1) mod N) after worker (i), then
we get the following uneven scheduling:

  W1:  T1         T11
  W2:  T2         T12
  W3:  T3     T8  T13
  W4:  T4 T6  T9  T14
  W5:  T5 T7  T10 T15

Because T6 and T8 have particular scheduling requirements, they cause the
scheduling to skip workers. The result is that W4 and W5 get too many
transactions, while W1 and W2 get too few, and parallelism is reduced.

I hope this explain the reasoning.

> Could you please have a look at it?

If I understand your patch correctly, it schedules using a simple
incrementing counter when there are no dependency requirements, and so would
suffer from the uneven scheduling in the above example.

It also looks like there's a mistake in the code:

> +      if ((idx= check_xa_xid_dependency(&gtid_ev->xid)) < (uint32) -1)
> +      {
> +        /*
> +          A previously scheduled event group with the same XID might still be
> +          active in a worker, so schedule this event group in the same worker
> +          to avoid a conflict.
> +        */
> +      }
> +      else
> +      {
> +        /* Record this XID now active. */
> +        xid_active_generation *a=
> +          (xid_active_generation *)alloc_dynamic(&maybe_active_xid);
> +        if (!a)
> +          return NULL;
> +        ++idx;
> +        if (idx >= rpl_thread_max)
> +          idx= 0;

If check_xa_xid_dependency() returns "no dependency", then idx is set to
(uint32)-1 and the else {...} branch is executed. This does idx++, which
increments the (uint32)-1 to 0. So it looks to me that _all_ transactions
are scheduled on worker 0, or am I missing something?

This is a simple mistake, it's an easy fix to not assign idx when
check_xa_xid_dependency() returns "no dependency". But the code would still
suffer from uneven scheduling as in the example, IIUC.

So there's a good reason to introduce the thread_sched_fifo as done in my
refactoring patch.

This though still leaves the limitation in my minimal patch that XA PREPARE
xid1 cannot group commit with XA COMMIT xid1. The impact of this will depend
on how many transactions on average appear between an XA PREPARE and its
associated XA COMMIT, as well as on how expensive fsync() is relative to the
cost of each transaction.

 - Kristian.
_______________________________________________
developers mailing list -- developers@lists.mariadb.org
To unsubscribe send an email to developers-le...@lists.mariadb.org

Reply via email to