[MariaDB developers] Re: [PATCH] MDEV-34481 optimize away waiting for owned by prepared xa non-unique index record

Kristian Nielsen via developers Wed, 31 Jul 2024 01:22:11 -0700

> From: Andrei <andrei.el...@mariadb.com>
>
> This work partly implements a ROW binlog format MDEV-33455 part
> that is makes non-unique-index-only table XA replication safe in RBR.
> Existence of at least one non-null unique index has always guaranteed
> safety (no hang to error).


Hi Andrei,

A couple early comments on your MDEV-34481 patch, about things that affect
non-XA parts of the code and will need to be changed:

> diff --git a/sql/sql_class.cc b/sql/sql_class.cc
> index 6419a58fbe4..b2321cfd865 100644
> --- a/sql/sql_class.cc
> +++ b/sql/sql_class.cc

> @@ -5444,20 +5449,33 @@ thd_need_wait_reports(const MYSQL_THD thd)
>    transaction.
>  */
>  extern "C" int
> -thd_rpl_deadlock_check(MYSQL_THD thd, MYSQL_THD other_thd)
> +thd_rpl_deadlock_check(MYSQL_THD thd, MYSQL_THD other_thd,
> +                       bool is_other_prepared, bool is_index_unique,
> +                       uint *flagged)

> +
> +  /*
> +    Return to the caller the fact of the non-unique index wait lock
> +    conflicts with one of a prepared state transaction.
> +  */
> +  if (!is_index_unique && rgi && rgi->is_row_event_execution()) {
> +    if (is_other_prepared && !other_thd)
> +    {
> +      rgi->exec_flags |= 1 << rpl_group_info::HIT_BUSY_INDEX;
> +      (*flagged)++;
> +    }
> +  }
> +

No, thd_rpl_deadlock_check() has a specific purpose related to handling
deadlock between active transactions. Don't add this XA-specific and
unrelated logic to this function.

Instead, add a separate function that InnoDB can call in the specific case
you need. That function then only needs the parameters `thd` and `flagged`,
the others are not needed:

  extern "C" bool thd_xa_skip_lock_check(MYSQL_THD thd, uint *flagged) {
    return (thd->rgi_slave && thd->rgi_slave->us_row_event_execution());
  }

Then in InnoDB, call something like this:

  if (lock->trx->state == TRX_STATE_PREPARED &&
      (!lock->index->is_unique() || lock->index->n_nullable))
    count_non_unique_index_hit++;


About the HIT_BUSY_INDEX, see here:

> diff --git a/sql/log_event_server.cc b/sql/log_event_server.cc
> index f3d01b7d9ed..166295ed263 100644
> --- a/sql/log_event_server.cc
> +++ b/sql/log_event_server.cc

> @@ -8149,11 +8154,20 @@ int Rows_log_event::find_row(rpl_group_info *rgi)
>                                                          HA_READ_KEY_EXACT))))
>      {
>        DBUG_PRINT("info",("no record matching the key found in the table"));
> -      if (error == HA_ERR_KEY_NOT_FOUND)
> -        error= row_not_found_error(rgi);
> -      table->file->print_error(error, MYF(0));
> -      table->file->ha_index_end();
> -      goto end;
> +      if (error == HA_ERR_LOCK_WAIT_TIMEOUT && !is_index_unique &&
> +          (rgi->exec_flags & (1 << rpl_group_info::HIT_BUSY_INDEX)))
> +      {
> +        rgi->exec_flags &= ~(1 << rpl_group_info::HIT_BUSY_INDEX);
> +        skip_first_compare= true;
> +      }

Here, you are effectively making a decision in the storage engine to skip
this record which is locked by an XA PREPAREd transaction. But you are
communicating this back to the server in a very convoluted way, by
introducing the rgi->exec_flags and setting HIT_BUSY_INDEX.

This is not the way to communicate back from engine to server. Instead,
simply return a different error code than HA_ERR_LOCK_WAIT_TIMEOUT from
InnoDB that you can check for here. You can maybe use one of the existing
HA_ERR_* from include/my_base.h, or add a new one if necessary.

  if (error == HA_ERR_<something>)
    skip_first_compare= true;

> diff --git a/sql/rpl_rli.h b/sql/rpl_rli.h
> index 3f24481839b..a9ec44b58cb 100644
> --- a/sql/rpl_rli.h
> +++ b/sql/rpl_rli.h
> @@ -833,6 +833,15 @@ struct rpl_group_info
>      RETRY_KILL_KILLED
>    };
>    uchar killed_for_retry;
> +  enum enum_exec_flags {
> +    HIT_BUSY_INDEX= 0
> +  };
> +  /*
> +    Being executed replication event's context. It could contain
> +    notification from engine such as attempts to lock already acquired
> +    index record.
> +  */
> +  uint32 exec_flags;

... and then you don't need to add this.


Also, are you aware that when you skip a record like this, you may be
skipping the very record that needs to be changed?

For example a case like this, with three transactions in order:

  XA START 'T1';
  UPDATE my_table SET value=10 WHERE key=2
  XA END 'T1'; XA PREPARE 'T1';

  XA COMMIT 'T1';

  XA START 'T2';
  UPDATE my_table SET value=20 WHERE key=2;

Transaction 'T2' may run speculatively in optimistic parallel replication
after prepare of T1, but before commit of T1. Then it will skip the record
key=2, IIUC.

I didn't see any explanation of this case in the patch commit message or
comments. Also I don't see this case tested in the test case, that needs to
be added I believe.

 - Kristian.
_______________________________________________
developers mailing list -- developers@lists.mariadb.org
To unsubscribe send an email to developers-le...@lists.mariadb.org

[MariaDB developers] Re: [PATCH] MDEV-34481 optimize away waiting for owned by prepared xa non-unique index record

Reply via email to