Hi Marcel

On Wed, Apr 15, 2015 at 2:23 AM, Marcel Ruiz Forns <[email protected]> wrote:
> Hi Sean,
>
> Here's Marcel from Analytics.
>
> I'd like to comment with you some strange behaviors that we've observed on
> EventLogging database (m4-master.equiad.wmnet).
>
> 1) There are some time spans where there is no data in any table. Examples
> follow:
>
> 2015-04-09 17:20 -> 18:35
> 2015-04-11 03:35 -> 05:20
> 2015-04-11 14:00 -> 15:20
> 2015-04-11 19:00 -> 20:00
> 2015-04-12 14:30 -> 15:40
> 2015-04-13 11:35 -> 12:30
> 2015-04-13 16:30 -> 17:35
>
> This is happening a lot, as you can see, so we are really concerned about
> it.
> These outages are not explained by any of the EventLogging logs (processor
> logs, consumer logs) which confirm that the events were actually sent
> normally to the database (and written) for those time spans.

We have a binary log on the EL master that holds the last week of
INSERT statements. It can be dumped and grepped, eg looking at
10-minute blocks around 2015-04-13 16:30:

root@db1046:# /opt/wmf-mariadb10/bin/mysqlbinlog db1046-bin.000* |
grep INSERT | grep '20150413161' | wc -l
26849

root@db1046:# /opt/wmf-mariadb10/bin/mysqlbinlog db1046-bin.000* |
grep INSERT | grep '20150413163' | wc -l
0

Zero(!) during 10min after 16:30 doesn't look good. This means the
database master did not see any INSERTs with 20150413163* timestamps
within the last week. Can you describe how you know that events were
written normally? Simply a lack of errors from mysql consumer?

Can you supply some specific records from the EL logs with timestamps
that should definitely be in the database, so we can scan the database
binlog for specific UUIDs or suchlike?

>
> 2) Maybe it has something to do with it, or not. But we observed that
> sometimes the last events inserted in the tables date from 30mins - 1h30mins
> in the past. At other moments, the last inserts date from seconds ago. This
> is maybe expected due to slave replication but we're not sure.

Simple answer is: Yes, lag is quite possible given the high write rate
of EL, and can also be affected by load from other replication streams
of the production wikis (cumulative and affected by time of day,
spikes, jobrunners, etc). It's possible we could tune it better though
-- can you give me some idea of how long your "at other moments" delay
is?

BR
Sean

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to