Thanks for such comprehensive notes! On Friday, June 20, 2014, Christian Aistleitner <[email protected]> wrote:
> Hi, > > TL;DR: When consuming EventLogging data, only rely on the 'log' > database available from m2 replicas, like analytics-store.eqiad.wmnet. > > Other representations might not get updated, might not get fix-ups or > may (on purpose) give you unvalidated data. > > > ---------------------------------- > > > Due to the versatile design of EventLogging, its data exists/existed > in many different representations, which got me confused around the > data quality expectations. Also I could not find them publicly > documented. After talking about different aspects with a few people, I > wanted to put my current understanding of it up for public discussion. > > Please let me know (either in private or on list), if something looks > wrong or does not match your use of EventLogging data. > > > * MySQL / MariaDB database on m2 > > This database is the best place to consume EventLogging data from. > > Available as 'log' database on m2 replicas, such as > analytics-store.eqiad.wmnet. > > Only validated events enter the database. > > In case of bugs, this database is the only place that gets fixes like > cleanup of historic data, or live fixes. > > > > * 'all-events' JSON log files [1] > > Use this data source only to debug issues around ingestion into the m2 > database. > > Entries are JSON objects. > > Only validated events get written. > > In case of bugs, historic data does not get fixed. > > > > * Raw client and server side log files [2] > > Use this data source only to debug issues around ingestion into the m2 > database. > > Entries are parameters to the event.gif's request. They are not > decoded at all. > > In case of bugs, historic data does not get fixed. Neither need > hot-fixes reach those files. > > > > * Kafka: > EventLogging data is no longer fed into Kafka since 2014-06-12 [3]. > The EventLogging data in Kafka had no users. > Turning it on again is tracked in bug 66528 [4]. > > > > * MongoDB: > EventLogging data is no longer fed into MongoDB since 2014-02-13 [5]. > The EventLogging data in MongoDB did not appear to get used. > I am not aware of plans to revive feeding the data into MongoDB. > > > > * ZMQ: > ZMQ is available from vanadium. > In case of bugs, historic data cannot get fixed :-) > Data coming from the forwarders (ports 8421, 8422) is not validated > and need not see hot-fixes. > Data coming from processors (port 8521, 8522) and multiplexer (port > 8600) is validated. > > > > Have fun, > Christian > > > > [1] Available as > stats1002:/a/eventlogging/archive/all-events.log-$DATE.gz > stats1003:/srv/eventlogging/archive/all-events.log-$DATE.gz > vanadium:/var/log/eventlogging/... > > [2] Available as > stats1002:/a/eventlogging/archive/client-side-events.log-$DATE.gz > stats1002:/a/eventlogging/archive/server-side-events.log-$DATE.gz > stats1003:/srv/eventlogging/archive/client-side-events.log-$DATE.gz > stats1003:/srv/eventlogging/archive/server-side-events.log-$DATE.gz > vanadium:/var/log/eventlogging/... > > [3] > https://git.wikimedia.org/commitdiff/operations%2Fpuppet.git/f85b1dbcd61bbb58684ff93704c1804e808a5d6e > > [4] https://bugzilla.wikimedia.org/show_bug.cgi?id=66528 > > [5] > https://git.wikimedia.org/commitdiff/operations%2Fpuppet.git/05b4027973c59b0a786433f8dae2bd1fe28b614f > > > > > -- > ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ---- > Companies' registry: 360296y in Linz > Christian Aistleitner > Kefermarkterstrasze 6a/3 Email: [email protected] > <javascript:;> > 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 > Fax: +43 7946 / 20 5 81 > Homepage: http://quelltextlich.at/ > --------------------------------------------------------------- > -- Oliver Keyes Research Analyst Wikimedia Foundation
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
