Tilman - done, but apologies for the not very useful link formatting on
that tool tip.  I'll file a phab bug to improve that.  By the way,
annotations for the pageview data can be collaboratively edited:
https://meta.wikimedia.org/wiki/Dashiki:PageviewsAnnotations (unlocked for
now, we'll limit access if we start having problems).

On Wed, Aug 26, 2015 at 6:22 PM, Tilman Bayer <[email protected]> wrote:

> Thanks for the update! And BTW kudos also for marking these as
> annotations in the dashboard at https://vital-signs.wmflabs.org/
> (maybe link the incident reports from there as well?)
>
> On Wed, Aug 26, 2015 at 1:26 PM, Andrew Otto <[email protected]> wrote:
> > Hi all,
> >
> > Now that we’ve had a little space to analyze the problem, I wanted to
> call
> > out a recent webrequest data loss issue that we experienced on two
> separate
> > occasions.
> >
> > We attempted to upgrade to Kafka 0.8.2.1, and it wasn’t until the second
> > attempt that we actually found the problem.  Kafka 0.8.2.1 ships with a
> > buggy version of Snappy[1] that causes messages to not be compressed
> > properly.  This caused a ~4x increase network and disk I/O around the
> > cluster all at once.
> >
> > We’ve documented the incidents and the occasions of significant data loss
> > here:
> >
> >
> https://wikitech.wikimedia.org/wiki/Incident_documentation/20150803-Kafka
> >
> >
> https://wikitech.wikimedia.org/wiki/Incident_documentation/20150810-Kafka#Conclusions
> >
> > https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest
> >
> > This loss will affect the output of pagecount* and pageview datasets, as
> > well as other webrequest generated statistics.  Please consider
> statistics
> > that are generated from webrequest data using the following UTC hours
> > unreliable:
> >
> >   2015-08-03T18:00 - 2015-08-03T23:00
> >   2015-08-10T15:00 - 2015-08-10T21:00
> >   2015-08-11T17:00 - 2015-08-11T18:00
> >
> > Many apologies for any inconvenience this causes.  We’ve learned a lot
> > during this turmoil, and have a lot of ideas on how to hopefully prevent
> > this from happening in the future, and also how to reduce loss and
> > complexity if and when it does.  The analytics engineering team will be
> > doing a post mortem on this soon, in which we will document these ideas.
> >
> > Thanks,
> > -Andrew Otto
> >
> > [1] https://issues.apache.org/jira/browse/KAFKA-2189
> >
> >
> > _______________________________________________
> > Analytics mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >
>
>
>
> --
> Tilman Bayer
> Senior Analyst
> Wikimedia Foundation
> IRC (Freenode): HaeB
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to