Hi!

On Fri, Feb 21, 2025 at 3:44 PM Kristian Nielsen
<kniel...@knielsen-hq.org> wrote:
>
> Hi Monty,
>
> > From: Monty <mo...@mariadb.org>
> >
> > The purpose of this commit is to ensure that creation and changes of
> > temporary tables are properly and predicable logged to the binary
> > log.  It also fixes some bugs where ROW logging was used in MIXED mode,
> > when STATEMENT would be a better (and expected) choice.
>
>
> Here is my review of the patch for MDEV-36099 predictable binlog format for
> temporary tables:
<cut>

> I think it is important to mention this prominently in the release notes.
> There will invariably be some users that manipulate a large number of rows
> with temporary tables, eg.
>
>   BEGIN;
>   CREATE TEMPORARY TABLE tmp AS SELECT a, b, c FROM large_table1 JOIN 
> large_table2 ON ...
>   INSERT INTO other_table SELECT b, c FROM tmp WHERE a <100;
>   DROP TEMPORARY TABLE tmp;
>   COMMIT;
>
> This will change from binlogging a few hundred bytes of Query events to
> potentially many megabytes of row events, which could cause problems in some
> cases. I think the change is still justified from the benefits it gives and
> the relative few cases where this will cause serious problems, but it is
> still important to try to document it as well as possible.

I added the following last to the commit message:

The consequence of the above is that manipulations of a lot of rows
through temporary tables will by default be be slower in mixed mode.

For example:
  BEGIN;
  CREATE TEMPORARY TABLE tmp AS SELECT a, b, c FROM large_table1 JOIN
large_table2 ON ...
  INSERT INTO other_table SELECT b, c FROM tmp WHERE a <100;
  DROP TEMPORARY TABLE tmp;
  COMMIT;

By default this will create a huge entry in the binary log, compared
to just a few hundred bytes in statement mode. However the change in
this commit will make usage of temporary tables more reliable and
predicable and is thus worth it. Using statement mode or
setting create_temporary_table_binlog_formats can be used to avoid
this issue.

<cut>

> Questions:
>
> > diff --git a/sql/sql_parse.cc b/sql/sql_parse.cc
> > index c2f842aea56..3d1df5ac42f 100644
> > --- a/sql/sql_parse.cc
> > +++ b/sql/sql_parse.cc
> > @@ -531,6 +531,7 @@ void init_update_queries(void)
> >                                              CF_SCHEMA_CHANGE;
> >    sql_command_flags[SQLCOM_CREATE_SEQUENCE]=  (CF_CHANGES_DATA |
> >                                              CF_REEXECUTION_FRAGILE |
> > +                                            
> > CF_FORCE_ORIGINAL_BINLOG_FORMAT |
> >                                              CF_AUTO_COMMIT_TRANS |
> >                                              CF_SCHEMA_CHANGE);
>
> I did not get the reason for this change. I assume some bug fix found during
> test case writing?

Without the above change, the create of temporary sequences will be
binary logged.
If you remove the above, the the test binlog.binlog_empty_xa_prepared,mix
will fail with:
@@ -21,6 +21,11 @@
DROP TABLE tmp_1;
# Proof of  correct logging incl empty XA-PREPARE
include/show_binlog_events.inc
+Log_name       Pos     Event_type      Server_id       End_log_pos     Info
+master-bin.000001      #       Gtid    #       #       GTID #-#-#
+master-bin.000001      #       Query   #       #       use `test`;
CREATE TEMPORARY SEQUENCE seq_1
+master-bin.000001      #       Gtid    #       #       GTID #-#-#
+master-bin.000001      #       Query   #       #       DROP TEMPORARY
SEQUENCE IF EXISTS `test`.`seq_1` /* generated by server */

The reason is that without the flag, using CREATE TEMPORARY SEQUENCE
will be forced to be executed in statement mode (default for anything
that does not change row data)
 and thus binlogged.  CREATE TABLE has the flag
CF_CAN_GENERATE_ROW_EVENTS set, which does the same thing as
CF_FORCE_ORIGINAL_BINLOG_FORMAT.

> > --- a/sql/sql_class.h
> > +++ b/sql/sql_class.h
> > @@ -817,6 +817,7 @@ typedef struct system_variables
> >    /* Flags for slow log filtering */
> >    ulong log_slow_rate_limit;
> >    ulong binlog_format; ///< binlog format for this thd (see 
> > enum_binlog_format)
> > +  ulong create_temporary_table_binlog_formats;
>
> Is there any way to make this take less space than 8 bytes (on 64-bit)? It
> goes into every THD, and only two bits are used.

Unfortunately the interface of used for integer variables that can be
set from the command line only supports long.

>
> > +
> > +    Temporary tables will be logged only on CREATE in STMT format
> > +    or on INSERT if all changes to the table is in the binlog.
> >    */
> >    if ((WSREP_EMULATE_BINLOG(thd) || mysql_bin_log.is_open()) &&
> > -      (likely(!error) || thd->transaction->stmt.modified_non_trans_table ||
> > -       thd->log_current_statement()))
> > +      (table->s->using_binlog() ||
> > +       ((in_create_table &&
> > +         (!table->s->tmp_table || thd->binlog_create_tmp_table())))) &&
> > +       (likely(!error) ||
> > +        (!in_create_table &&
> > +         (thd->transaction->stmt.modified_non_trans_table ||
> > +          thd->log_current_statement()))))
>
> This conditional is now very hard to understand.
> I think I managed to understand all parts of it, except the part
> "thd->log_current_statement()", but that is unchanged with your patch.
> This if () would now benefit from being split into multiple parts, each of
> which explained in a comment.

Agree that the above is complex an could be documented better.
However the above will be totally rewritten by the next commit where the create
part is not done by the above function, so there is no need to comment this.

thd->log_current_statement() is used by some commands to force things to be
binlogged even if there was an error.  For example a CREATE TABLE ... SELECT
has to be logged in some cases if the original table was dropped (in
case of replace).
This will be fixed for most cases in the atomic create or replace
patch, which depends on
this patch.

The new code will look like this:

  if (!create_info &&
      (WSREP_EMULATE_BINLOG(thd) || mysql_bin_log.is_open()) &&
      table->s->using_binlog() &&
      (likely(!error) || thd->transaction->stmt.modified_non_trans_table ||
       thd->log_current_statement()))
  {

> > diff --git a/mysql-test/main/statistics.result 
> > b/mysql-test/main/statistics.result
> > index bb6469c46da..ebd9cb7f36e 100644
> > --- a/mysql-test/main/statistics.result
> > +++ b/mysql-test/main/statistics.result
> > @@ -1278,7 +1278,7 @@ Table   Op      Msg_type        Msg_text
> >  mysql.column_stats   analyze error   Invalid argument
> >  ANALYZE TABLE mysql.column_stats;
> >  Table        Op      Msg_type        Msg_text
> > -mysql.column_stats   analyze status  OK
> > +mysql.column_stats   analyze status  Table is already up to date
> >  SELECT * FROM mysql.table_stats;
> >  db_name      table_name      cardinality
> >  SELECT * FROM mysql.column_stats;
>
> I did not understand why this .result change happens after your patch?

This is because just before we executed 'DELETE FROM TABLE mysql.table_stats'
Because of a 'bug in the old code' that switches everything to row mode,
the DELETE was done in row mode which does a delete row by row.
Now the DELETE is executed in statement mode and the DELETE is optimized
by running truncate internally.  Truncate resets all status flags and the table
is 'up to date' from analyze tables point of view.
> Minor things:
>
> > diff --git a/mysql-test/suite/rpl/t/create_or_replace.inc 
> > b/mysql-test/suite/rpl/t/create_or_replace.inc
> > index e8fa95cb94a..b4e91d7b412 100644
> > --- a/mysql-test/suite/rpl/t/create_or_replace.inc
> > +++ b/mysql-test/suite/rpl/t/create_or_replace.inc
> > @@ -4,7 +4,24 @@
> >  --let $rpl_topology=1->2
> >  --source include/rpl_init.inc
> >
> > -# Create help tables
> > +select @@binlog_format, @@create_temporary_table_binlog_formats;
> > +
> > +# Copy create_temporary_table_binlog_formats from master to slave
> > +# This is done to get same results as with older versions of MariaDB.
> > +# The slave will work even if this is not done. However in mixed
> > +# format on slave temporary tables would not be logged to in the
> > +# slaves binary log
> > +let $format=`select @@create_temporary_table_binlog_formats`;
> > +connection server_2;
> > +--eval set @@global.create_temporary_table_binlog_formats='$format'
> > +# Ensure that the slave threads uses the new values.
> > +stop slave;
> > +start slave;
>
> Here, use --source include/stop_slave.inc and --source
> include/start_slave.inc. For consistent with other test cases.

Changed.

>
> >     - Rename of not binlogged temporary tables where binlogged to binary log
> >       which caused replication to stop.
>
> s/where/were/

Changed

> > +  /*
> > +    Mark if query was logged as statement. Used to check it tmp table 
> > changes
> > +    are logged.
> > +  */
> > +  bool query_binlogged_as_stmt;
>
> s/check it table/check if table/

Fixed

> +  /*
> +    This is set for all normal tables and for temporary tables that has
> +    the CREATE binary logged.
> +
> +    0 table table is not in binary log (not logged temporary table)
> +    1 table create was logged (normal table or logged temp table)
> +    2 table create was logged but not all changes are in the binary log.
> +      ROW LOGGING will be used for the table.
>
> s/that has the CREATE/that have the CREATE/
> s/table table is not/table is not/

Fixed

> > @@ -794,6 +821,20 @@ void THD::mark_tmp_table_as_free_for_reuse(TABLE 
> > *table)
> >
> >    DBUG_ASSERT(table->s->tmp_table);
> >
> > +  /*
> > +    Ensure that table changes where either binary logged or the table
> > +    is marked as not up to date.
> > +  */
>
> s/where/were/

Fixed.

Thanks a lot for the review!

Regards,
Monty
_______________________________________________
developers mailing list -- developers@lists.mariadb.org
To unsubscribe send an email to developers-le...@lists.mariadb.org

Reply via email to