Hi,

in the week from 2014-10-06–2014-10-12 Andrew, Jeff, and I worked on
the following items around the Analytics Cluster and Analytics related
Ops:

* ULSFO outage affecting webrequest logs (Bug 71876, Bug 71879)
* Revoked default Push grant for Analytics on gerrit's analytics/* projects
* Wikimetrics showing many requests to internal files
* Counting pageviews for the pages “undefined” / “Undefined” (Bug 66532)
* Counting redirect pageviews for Webstatscollector (Bug 71790)
* Reworking webstatscollector's build system
* Puppetization of MaxMind's Connection Type databases
* Wikihadoop now available on the Analytics Cluster
* Analytics Mini-Hackathon in San Francisco
(details below)

Have fun,
Christian



* ULSFO outage affecting webrequest logs (Bug 71876, Bug 71879)

It seems there have been connection issues from ULSFO, which caused a
minor hiccup in the webrequest logs on both udp2log and kafka [1]. Due
to kafka's buffering, kafka could nicely bridge the shorter dropouts,
and in total only a few minutes of data have been lost on kafka, while
udp2log was shaky for up to 2 hours.


* Revoked default Push grant for Analytics on gerrit's analytics/* projects

Per default, all Analytics members had Push permission on all of
gerrit's analytics/* project. As accidental pushes caused pain again,
we now revoked the default Push grant, and made sure that our bots
still had necessary permission to do their duty.


* Wikimetrics showing many requests to internal files

A fix for the mis-redirection of those monitoring requests has been
implemented (but it's not yet deployed).


* Counting pageviews for the pages “undefined” / “Undefined” (Bug 66532)

A short increase on requests for the pages “undefined” and “Undefined”
impacted pageview trend graphs. So after the initial push-back that
bug 66532 received, it was picked up again, and we prepared patches
for both the C and Hive implementation of webstatscollector's pageview
definition to not count such requests. Deployment of those patches is
likely to happen around 2014-10-15.


* Counting redirect pageviews for Webstatscollector (Bug 71790)

Ever since, the webstatscollector pageview definition has been
counting redirects, and was hence overcounting.
Since, we're about to deploy a webstatscollector anyways, we prepared
changes to fix this longstanding miscounting.


* Reworking webstatscollector's build system

Fresh compilations of webstatscollector's C implementation gave
executables that segfaulted. So we fixed some NULL dereferences, fixed
the build system, made it capable of compiling with optimization
turned on, and built a rudimentary testsuite for the collector
process. Thereby, we can now again build the collector executable, and
can automatically verify that it's working.


* Puppetization of MaxMind's Connection Type databases

MaxMind's Connection Type (NetSpeed) databases have been
puppetized. They are available for example on stat1002, and stat1003
at

  /usr/share/GeoIP/GeoIPNetSpeedCell.dat
  /usr/share/GeoIP/GeoIPNetSpeed.dat

.


* Wikihadoop now available on the Analytics Cluster

This allows for easier parsing of Mediawiki xml revision dumps.


* Analytics Mini-Hackathon in San Francisco

During this week, the Analytics Mini-Hackathon took place, and
more prototyping around
** Scoop and Oozification
** Streaming data into HDFS
happened, and some time was spend on hunting down the kafkatee issues.



-- 
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
                           Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3     Email:  [email protected]
4293 Gutau, Austria          Phone:          +43 7946 / 20 5 81
                             Fax:            +43 7946 / 20 5 81
                             Homepage: http://quelltextlich.at/
---------------------------------------------------------------

Attachment: signature.asc
Description: Digital signature

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to