On Thu, Mar 20, 2014 at 3:49 PM, Toby Negrin <[email protected]> wrote:
> We will work with Ori to understand what level of effort is required to > support EventLogging. It's likely that Analytics and techops (and Ori) will > need to collaborate on what will need to be done. > * The Ganglia scripts need to be fixed. * A daily report should go out reporting the number of valid and invalid events logged, broken down by schema. * Someone needs to scan that report for anything usual, file bugs for code that violates its data model, and follow-up with the relevant team to ensure a fix. * Alerts need to be responded to. * Once a month, the backup process (vanadium -> stat1001 -> tridge) should get a quick lookover to ensure that it is functioning. * Once every six months, a drill should be conducted to test system failover and recovery procedures. * There should be a designated person to provide technical advice and Gerrit code review for new instrumentation code. (This has already scaled beyond just me -- folks like Matt F, Yuvi, Jon, Bryan, etc. have the requisite expertise. But someone needs to own this, and be accountable that code review happens in a prompt fashion.) * Bugs reported in Bugzilla should be acknowledged and resolved. Toby, I think you guys have the requisite talent and capacity to handle it internally.
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
