On Wed, Nov 25, 2015 at 1:21 PM, Jaime Crespo <[email protected]> wrote: > Always fearing doing queries on a lagged replica on labs? Not anymore! > > While Betacommand's tool [0] was very useful, it was also very inaccurate, > as it tried to check the lag by looking at the last rows updated, which can > be a lot of time on the least popular wikis. > > What I offer now is sub-second accurate lag measuring, by writing on the > production masters the current time, in microseconds, every 0.5 seconds and > making that available on all hosts (using this tool [1]). So, it is more > accurate than SHOW SLAVE STATUS, because it compares the difference with the > original master, and it will work even if replication is broken. > > To read it, just do SELECT * FROM heartbeat_p.heartbeat; > And you will get: > +-------+----------------------------+------+ > | shard | last_updated | lag | > +-------+----------------------------+------+ > | s6 | 2015-11-25T20:20:32.000980 | 0 | > | s2 | 2015-11-25T20:20:32.001030 | 0 | > | s7 | 2015-11-25T20:20:32.001070 | 0 | > | s3 | 2015-11-25T20:20:32.001000 | 0 | > | s4 | 2015-11-25T20:20:32.000920 | 0 | > | s1 | 2015-11-25T20:20:32.000740 | 0 | > | s5 | 2015-11-25T20:20:32.000830 | 0 | > +-------+----------------------------+------+ > > Read the detailed documentation on: [2] > > Use it, create a web page if you want to make it public! Report a ticket if > it gets too high! Report a ticket if you need more info (a record per > wiki?). But I wanted to give you the essentials, and you can build > yourselves on top of that. > > Only 2 know bugs: > - There is microsecond accuracy, but it cannot be used until a bug in > MariaDB is fixed [3] > - enwiki will only report s1 lag until that server is restarted due to some > existing filters. We will schedule that at some time in the future. > > [0]<http://tools.wmflabs.org/betacommand-dev/cgi-bin/replag> > [1]<https://www.percona.com/doc/percona-toolkit/2.2/pt-heartbeat.html> > [2]<https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Database#Identifying_lag> > [3]<https://mariadb.atlassian.net/browse/MDEV-9175>
I made a tool [4] that reads the heartbeat_p database on from the server that hosts each shard and matches it with the shard for each wiki. The tool gets all (dbname, slice) pairs from meta_p.wiki and the slice replag from heartbeat_p.heartbeat from the server hosting each slice and then matching them up in the table. I think I got the logic here right, but you can view the source [5] to see if you agree. [4]: https://tools.wmflabs.org/replag/ [5]: https://tools.wmflabs.org/replag/?source Bryan -- Bryan Davis Wikimedia Foundation <[email protected]> [[m:User:BDavis_(WMF)]] Sr Software Engineer Boise, ID USA irc: bd808 v:415.839.6885 x6855 _______________________________________________ Labs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/labs-l
