Thanks for reporting this. Pine On Aug 26, 2015 1:27 PM, "Andrew Otto" <[email protected]> wrote:
> Hi all, > > Now that we’ve had a little space to analyze the problem, I wanted to call > out a recent webrequest data loss issue that we experienced on two separate > occasions. > > We attempted to upgrade to Kafka 0.8.2.1, and it wasn’t until the second > attempt that we actually found the problem. Kafka 0.8.2.1 ships with a > buggy version of Snappy[1] that causes messages to not be compressed > properly. This caused a ~4x increase network and disk I/O around the > cluster all at once. > > We’ve documented the incidents and the occasions of significant data loss > here: > > https://wikitech.wikimedia.org/wiki/Incident_documentation/20150803-Kafka > > > https://wikitech.wikimedia.org/wiki/Incident_documentation/20150810-Kafka#Conclusions > > https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest > > This loss will affect the output of pagecount* and pageview datasets, as > well as other webrequest generated statistics. Please consider statistics > that are generated from webrequest data using the following UTC hours > unreliable: > > 2015-08-03T18:00 - 2015-08-03T23:00 > 2015-08-10T15:00 - 2015-08-10T21:00 > 2015-08-11T17:00 - 2015-08-11T18:00 > > Many apologies for any inconvenience this causes. We’ve learned a lot > during this turmoil, and have a lot of ideas on how to hopefully prevent > this from happening in the future, and also how to reduce loss and > complexity if and when it does. The analytics engineering team will be > doing a post mortem on this soon, in which we will document these ideas. > > Thanks, > -Andrew Otto > > [1] https://issues.apache.org/jira/browse/KAFKA-2189 > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
