Tilman - done, but apologies for the not very useful link formatting on that tool tip. I'll file a phab bug to improve that. By the way, annotations for the pageview data can be collaboratively edited: https://meta.wikimedia.org/wiki/Dashiki:PageviewsAnnotations (unlocked for now, we'll limit access if we start having problems).
On Wed, Aug 26, 2015 at 6:22 PM, Tilman Bayer <[email protected]> wrote: > Thanks for the update! And BTW kudos also for marking these as > annotations in the dashboard at https://vital-signs.wmflabs.org/ > (maybe link the incident reports from there as well?) > > On Wed, Aug 26, 2015 at 1:26 PM, Andrew Otto <[email protected]> wrote: > > Hi all, > > > > Now that we’ve had a little space to analyze the problem, I wanted to > call > > out a recent webrequest data loss issue that we experienced on two > separate > > occasions. > > > > We attempted to upgrade to Kafka 0.8.2.1, and it wasn’t until the second > > attempt that we actually found the problem. Kafka 0.8.2.1 ships with a > > buggy version of Snappy[1] that causes messages to not be compressed > > properly. This caused a ~4x increase network and disk I/O around the > > cluster all at once. > > > > We’ve documented the incidents and the occasions of significant data loss > > here: > > > > > https://wikitech.wikimedia.org/wiki/Incident_documentation/20150803-Kafka > > > > > https://wikitech.wikimedia.org/wiki/Incident_documentation/20150810-Kafka#Conclusions > > > > https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest > > > > This loss will affect the output of pagecount* and pageview datasets, as > > well as other webrequest generated statistics. Please consider > statistics > > that are generated from webrequest data using the following UTC hours > > unreliable: > > > > 2015-08-03T18:00 - 2015-08-03T23:00 > > 2015-08-10T15:00 - 2015-08-10T21:00 > > 2015-08-11T17:00 - 2015-08-11T18:00 > > > > Many apologies for any inconvenience this causes. We’ve learned a lot > > during this turmoil, and have a lot of ideas on how to hopefully prevent > > this from happening in the future, and also how to reduce loss and > > complexity if and when it does. The analytics engineering team will be > > doing a post mortem on this soon, in which we will document these ideas. > > > > Thanks, > > -Andrew Otto > > > > [1] https://issues.apache.org/jira/browse/KAFKA-2189 > > > > > > _______________________________________________ > > Analytics mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > -- > Tilman Bayer > Senior Analyst > Wikimedia Foundation > IRC (Freenode): HaeB > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
