> How about doing this per wiki? That is as easy as joining this with the meta tables, I challenge all of you to see who can make it faster :-)
On Thu, Nov 26, 2015 at 6:49 PM, Yetkin Sakal <[email protected]> wrote: > How about doing this per wiki? > > > > > > On Thursday, November 26, 2015 10:19 AM, Jaime Crespo < > [email protected]> wrote: > > > > So even if the replicas don't get updated the heartbeat will report them > as up to date? > > Not sure exactly what you mean with that. The masters will be updated > continuously every 0.5 seconds (all slaves are read only- no writes are > done there). If replication works, and slaves get updated, that will mean > that they will receive the heartbeat with the same replication channel than > the rest of the updates. If replication doesn't work, and replicas do not > get updated, they will not receive the heartbeat either, as it comes from > replication in order. If replication stops/fails, heartbeat update will > stop (from the slave perspective), and lag will start to increase from your > perspective (difference between last timestamp written and current time). > > This measures the replication lag (aka difference with the master), not > the last time an edit was done by a user, which was what the first link I > sent measured. In other words, if jaimewiki receives only user edits every > hour, heartbeat will still do a write to its master every half a seconds, > thus proving that it is up to date with that resolution. You can still > check the last user edit by checking recentchanges. > > The only reason this could fail (heartbeat updated but wiki not) is if > there was a specific filter denying replication but allowing hearbeat, only > done for specific tables and private wikis. Also the production master > could have a problem, but that would affect the wikis itselves, not only > labs. > > To give you an idea of the accuracy of this method, we (will) use it on > production to decide if a slave is usable or not to return up-to-date data. > > For more information on how this works, check < > https://www.percona.com/doc/percona-toolkit/2.1/pt-heartbeat.html#description > > > > On Wed, Nov 25, 2015 at 9:51 PM, Ricordisamoa < > [email protected]> wrote: > > Il 25/11/2015 21:21, Jaime Crespo ha scritto: > > Always fearing doing queries on a lagged replica on labs? Not anymore! > > While Betacommand's tool [0] was very useful, it was also very inaccurate, > as it tried to check the lag by looking at the last rows updated, which can > be a lot of time on the least popular wikis. > > What I offer now is sub-second accurate lag measuring, by writing on the > production masters the current time, in microseconds, every 0.5 seconds and > making that available on all hosts (using this tool [1]). So, it is more > accurate than SHOW SLAVE STATUS, because it compares the difference with > the original master, and it will work even if replication is broken. > > > So even if the replicas don't get updated the heartbeat will report them > as up to date? > > > To read it, just do SELECT * FROM heartbeat_p.heartbeat; > And you will get: > +-------+----------------------------+------+ > | shard | last_updated | lag | > +-------+----------------------------+------+ > | s6 | 2015-11-25T20:20:32.000980 | 0 | > | s2 | 2015-11-25T20:20:32.001030 | 0 | > | s7 | 2015-11-25T20:20:32.001070 | 0 | > | s3 | 2015-11-25T20:20:32.001000 | 0 | > | s4 | 2015-11-25T20:20:32.000920 | 0 | > | s1 | 2015-11-25T20:20:32.000740 | 0 | > | s5 | 2015-11-25T20:20:32.000830 | 0 | > +-------+----------------------------+------+ > > Read the detailed documentation on: [2] > > Use it, create a web page if you want to make it public! Report a ticket > if it gets too high! Report a ticket if you need more info (a record per > wiki?). But I wanted to give you the essentials, and you can build > yourselves on top of that. > > Only 2 know bugs: > - There is microsecond accuracy, but it cannot be used until a bug in > MariaDB is fixed [3] > - enwiki will only report s1 lag until that server is restarted due to > some existing filters. We will schedule that at some time in the future. > > [0]<http://tools.wmflabs.org/betacommand-dev/cgi-bin/replag> > [1]<https://www.percona.com/doc/percona-toolkit/2.2/pt-heartbeat.html> > [2]< > https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Database#Identifying_lag > > > [3]<https://mariadb.atlassian.net/browse/MDEV-9175> > -- > Jaime Crespo > <http://wikimedia.org> > > > _______________________________________________ > Labs-l mailing > [email protected]https://lists.wikimedia.org/mailman/listinfo/labs-l > > > _______________________________________________ > Labs-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/labs-l > > > > > -- > Jaime Crespo > <http://wikimedia.org> > > _______________________________________________ > Labs-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/labs-l > > > > _______________________________________________ > Labs-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/labs-l > > -- Jaime Crespo <http://wikimedia.org>
_______________________________________________ Labs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/labs-l
