Il 25/11/2015 21:21, Jaime Crespo ha scritto:
Always fearing doing queries on a lagged replica on labs? Not anymore!

While Betacommand's tool [0] was very useful, it was also very inaccurate, as it tried to check the lag by looking at the last rows updated, which can be a lot of time on the least popular wikis.

What I offer now is sub-second accurate lag measuring, by writing on the production masters the current time, in microseconds, every 0.5 seconds and making that available on all hosts (using this tool [1]). So, it is more accurate than SHOW SLAVE STATUS, because it compares the difference with the original master, and it will work even if replication is broken.

So even if the replicas don't get updated the heartbeat will report them as up to date?


To read it, just do SELECT * FROM heartbeat_p.heartbeat;
And you will get:
+-------+----------------------------+------+
| shard | last_updated               | lag  |
+-------+----------------------------+------+
| s6    | 2015-11-25T20:20:32.000980 |    0 |
| s2    | 2015-11-25T20:20:32.001030 |    0 |
| s7    | 2015-11-25T20:20:32.001070 |    0 |
| s3    | 2015-11-25T20:20:32.001000 |    0 |
| s4    | 2015-11-25T20:20:32.000920 |    0 |
| s1    | 2015-11-25T20:20:32.000740 |    0 |
| s5    | 2015-11-25T20:20:32.000830 |    0 |
+-------+----------------------------+------+

Read the detailed documentation on: [2]

Use it, create a web page if you want to make it public! Report a ticket if it gets too high! Report a ticket if you need more info (a record per wiki?). But I wanted to give you the essentials, and you can build yourselves on top of that.

Only 2 know bugs:
- There is microsecond accuracy, but it cannot be used until a bug in MariaDB is fixed [3] - enwiki will only report s1 lag until that server is restarted due to some existing filters. We will schedule that at some time in the future.

[0]<http://tools.wmflabs.org/betacommand-dev/cgi-bin/replag>
[1]<https://www.percona.com/doc/percona-toolkit/2.2/pt-heartbeat.html>
[2]<https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Database#Identifying_lag>
[3]<https://mariadb.atlassian.net/browse/MDEV-9175>
--
Jaime Crespo
<http://wikimedia.org>


_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l
_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l

Reply via email to