If people are interested in zooming into the loss data a little more, there's a zoomable graph here: http://debugging.wmflabs.org/
On Wed, Aug 26, 2015 at 4:54 PM, Pine W <[email protected]> wrote: > Thanks for reporting this. > > Pine > On Aug 26, 2015 1:27 PM, "Andrew Otto" <[email protected]> wrote: > >> Hi all, >> >> Now that we’ve had a little space to analyze the problem, I wanted to >> call out a recent webrequest data loss issue that we experienced on two >> separate occasions. >> >> We attempted to upgrade to Kafka 0.8.2.1, and it wasn’t until the second >> attempt that we actually found the problem. Kafka 0.8.2.1 ships with a >> buggy version of Snappy[1] that causes messages to not be compressed >> properly. This caused a ~4x increase network and disk I/O around the >> cluster all at once. >> >> We’ve documented the incidents and the occasions of significant data loss >> here: >> >> https://wikitech.wikimedia.org/wiki/Incident_documentation/20150803-Kafka >> >> >> https://wikitech.wikimedia.org/wiki/Incident_documentation/20150810-Kafka#Conclusions >> >> https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest >> >> This loss will affect the output of pagecount* and pageview datasets, as >> well as other webrequest generated statistics. Please consider statistics >> that are generated from webrequest data using the following UTC hours >> unreliable: >> >> 2015-08-03T18:00 - 2015-08-03T23:00 >> 2015-08-10T15:00 - 2015-08-10T21:00 >> 2015-08-11T17:00 - 2015-08-11T18:00 >> >> Many apologies for any inconvenience this causes. We’ve learned a lot >> during this turmoil, and have a lot of ideas on how to hopefully prevent >> this from happening in the future, and also how to reduce loss and >> complexity if and when it does. The analytics engineering team will be >> doing a post mortem on this soon, in which we will document these ideas. >> >> Thanks, >> -Andrew Otto >> >> [1] https://issues.apache.org/jira/browse/KAFKA-2189 >> >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
