Thank you for your response!
The problem has happened again today, this time to one of the upstream
servers.
First (most important) question: One of the replicas filled the disk
(again). It's the IO thread that's hung:
+--------+-------------+-----------+--------------+--------------+-------+--------------------------------------------------------+-----------------------+----------+
| Id | User | Host | db | Command | Time
| State | Info
| Progress |
+--------+-------------+-----------+--------------+--------------+-------+--------------------------------------------------------+-----------------------+----------+
| 5 | system user | | NULL | Slave_IO | 93846
| Waiting for someone to free space | NULL
| 0.000 |
| 7 | system user | | NULL | Slave_worker | 6688
| Commit | NULL
| 0.000 |
| 8 | system user | | NULL | Slave_worker | 6688
| Waiting for someone to free space | NULL
| 0.000 |
| 6 | system user | | NULL | Slave_SQL | 6987
| Slave has read all relay log; waiting for more updates | NULL
| 0.000 |
I've cleared space (at least enough that it should process for a few
minutes; just trying to get it responsive so I can restart it
gracefully), but the thread won't budge. Oddly enough, PURGE BINARY LOGS
TO ... appeared to work in that it cleared older logs I didn't need; but
it (the command) did not return properly. It too was hanging and I had
to 'CTRL-C' it. I also tried STOP SLAVE SQL_THREAD (with the intention
of following with STOP SLAVE IO_THREAD) but it too hung. Tried CTRL-C
there, as well, but it's still showing in the process list as "KILLED".
Is there anything I can do to get the slave thread to be responsive
again short of force shutting down the DB? Should I try STOP SLAVE
IO_THREAD?
As far as the binlog format issue:
I have a production primary DB (A) which feeds two replicas (let's call
them B and C). I.E., both B and C are directly connected to and
replicating from A. Other servers then replication from C. The
interesting thing here is that the problem showed up this time on both B
and C; they both were exhibiting the behavior of where everything they
wrote into their logs was in ROW format. I checked the global
BINLOG_FORMAT and it was "MIXED" as I expected (on both B and C). Is
there any way to see what the BINLOG_FORMAT is of the slave threads
themselves while they are still running? B was not out of space, so I
was able to restart the slave there and I'm waiting to see if the binary
logs are still problematic. C is the one that is still hung after
clearing space.
And in all of this I should say no, we don't have anything (at least
that I can find) that is changing the BINLOG_FORMAT directly. Indeed,
server B is AWS RDS, and I don't think anything has access to change it
without shutting down the server anyway (which I know has not happened).
Thank you!
Dan
On 3/5/2025 8:08 AM, Kristian Nielsen wrote:
mariadb--- via discuss <discuss@lists.mariadb.org> writes:
Does someone know of a reason why a downstream replica
would--seemingly spontaneously--start writing ALL of its binary log
entries in ROW format, even though binlog_format is MIXED?
I stopped/restarted it this morning and it's writing in MIXED again,
as expected.
This suggests that binlog format was set to ROW at some point, and that the
slave SQL thread had not been restarted since it was set back to statement.
Ie. the following scenario:
SET GLOBAL binlog_format=ROW;
STOP SLAVE;
START SLAVE;
SET GLOBAL binlog_format=MIXED;
Like other variables, SET GLOBAL only takes effect for new sessions (or
newly started slave threads). Thus, this would explain why stop/restart made
it go back to MIXED.
Any ideas?
This would require that the binlog format had been set to ROW temporarily
sometimes before the last slave restart, and back to MIXED after, which is
not known to have happened from what you wrote; but it seems a likely
explanation.
- Kristian.
_______________________________________________
discuss mailing list -- discuss@lists.mariadb.org
To unsubscribe send an email to discuss-le...@lists.mariadb.org