Thanks for such comprehensive notes!

On Friday, June 20, 2014, Christian Aistleitner <[email protected]>
wrote:

> Hi,
>
> TL;DR: When consuming EventLogging data, only rely on the 'log'
> database available from m2 replicas, like analytics-store.eqiad.wmnet.
>
> Other representations might not get updated, might not get fix-ups or
> may (on purpose) give you unvalidated data.
>
>
> ----------------------------------
>
>
> Due to the versatile design of EventLogging, its data exists/existed
> in many different representations, which got me confused around the
> data quality expectations. Also I could not find them publicly
> documented. After talking about different aspects with a few people, I
> wanted to put my current understanding of it up for public discussion.
>
> Please let me know (either in private or on list), if something looks
> wrong or does not match your use of EventLogging data.
>
>
> * MySQL / MariaDB database on m2
>
> This database is the best place to consume EventLogging data from.
>
> Available as 'log' database on m2 replicas, such as
> analytics-store.eqiad.wmnet.
>
> Only validated events enter the database.
>
> In case of bugs, this database is the only place that gets fixes like
> cleanup of historic data, or live fixes.
>
>
>
> * 'all-events' JSON log files [1]
>
> Use this data source only to debug issues around ingestion into the m2
> database.
>
> Entries are JSON objects.
>
> Only validated events get written.
>
> In case of bugs, historic data does not get fixed.
>
>
>
> * Raw client and server side log files [2]
>
> Use this data source only to debug issues around ingestion into the m2
> database.
>
> Entries are parameters to the event.gif's request. They are not
> decoded at all.
>
> In case of bugs, historic data does not get fixed. Neither need
> hot-fixes reach those files.
>
>
>
> * Kafka:
> EventLogging data is no longer fed into Kafka since 2014-06-12 [3].
> The EventLogging data in Kafka had no users.
> Turning it on again is tracked in bug 66528 [4].
>
>
>
> * MongoDB:
> EventLogging data is no longer fed into MongoDB since 2014-02-13 [5].
> The EventLogging data in MongoDB did not appear to get used.
> I am not aware of plans to revive feeding the data into MongoDB.
>
>
>
> * ZMQ:
> ZMQ is available from vanadium.
> In case of bugs, historic data cannot get fixed :-)
> Data coming from the forwarders (ports 8421, 8422) is not validated
> and need not see hot-fixes.
> Data coming from processors (port 8521, 8522) and multiplexer (port
> 8600) is validated.
>
>
>
> Have fun,
> Christian
>
>
>
> [1] Available as
>   stats1002:/a/eventlogging/archive/all-events.log-$DATE.gz
>   stats1003:/srv/eventlogging/archive/all-events.log-$DATE.gz
>   vanadium:/var/log/eventlogging/...
>
> [2] Available as
>   stats1002:/a/eventlogging/archive/client-side-events.log-$DATE.gz
>   stats1002:/a/eventlogging/archive/server-side-events.log-$DATE.gz
>   stats1003:/srv/eventlogging/archive/client-side-events.log-$DATE.gz
>   stats1003:/srv/eventlogging/archive/server-side-events.log-$DATE.gz
>   vanadium:/var/log/eventlogging/...
>
> [3]
> https://git.wikimedia.org/commitdiff/operations%2Fpuppet.git/f85b1dbcd61bbb58684ff93704c1804e808a5d6e
>
> [4] https://bugzilla.wikimedia.org/show_bug.cgi?id=66528
>
> [5]
> https://git.wikimedia.org/commitdiff/operations%2Fpuppet.git/05b4027973c59b0a786433f8dae2bd1fe28b614f
>
>
>
>
> --
> ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
>                            Companies' registry: 360296y in Linz
> Christian Aistleitner
> Kefermarkterstrasze 6a/3     Email:  [email protected]
> <javascript:;>
> 4293 Gutau, Austria          Phone:          +43 7946 / 20 5 81
>                              Fax:            +43 7946 / 20 5 81
>                              Homepage: http://quelltextlich.at/
> ---------------------------------------------------------------
>


-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to