On Thu, Apr 16, 2015 at 7:58 AM, Marcel Ruiz Forns <[email protected]> wrote:

> I followed the master-slave replication lag for some hours, and perceived a
> pattern in the lag: It gets progressively bigger with time, more or less
> with a 10 minute increase per hour, reaching lags of 1 to 2 hours. At that
> point, the data gap happens and the replication lag goes back to few minutes
> lag. I could only catch a data gap "live" 2 times, so that's definitely not
> a conclusive statement. But, there's this hypothesis that the two problems
> are related.

Today I've run some sync tests between EL master and analytisc-slave.
So far I've not found any discrepancies -- the master and slave
tables, when replication is caught up(!), have identical data. I infer
that the data gaps you found do exist but are not related to
replication or replication lag, and are occurring somewhere upstream
of analytics-store, either on the EL master (db1046) itself or between
the master and the consumer. I'll wait to see the example UUIDs to dig
further in the master binary logs.

Regarding the replication lag; a few observations:

- Asynchronous replication will always be susceptible to lag as long
as the slave handles other traffic. The fixes done to have the
consumer batch-insert records have greatly reduced the lag problem so
that we havn't seen 24hour+ lag in months, but asynchronous
replication does just what it says on the tin :-)

- An hour or two lag observed infrequently is often due to some
*other* activity on the slave. The way to track it down is to first
look for patterns -- eg, a certain time of day may indicate a poorly
optimized cron job or suchlike. If you do catch replication lag of
greater than 5min in the act, view the DB processlist to see what
other queries are executing. Check if something is simply hammering
the box, or if something is locking records or tables that are
attempting to replicate, or ... [insert strange cause here].

BR
Sean

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to