Re: [Wikitech-l] 📈 Wikimedia production errors help

Brennen Bearnes Tue, 15 Sep 2020 10:06:24 -0700

On 9/15/20 9:43 AM, Alex Ezell wrote:

Do we use levels for any of these error log outputs? That is, are they
classified on output as High, Medium, Low, Info, or something like that?

To an extent, yes. We have separate channels for PHP errors andexceptions, for example, and although I don't think we currentlydifferentiate in logstash, maybe we could plausibly draw a furtherdistinction between PHP error levels. Intuitively, a low number of PHPnotices probably indicates something of lower severity than a highnumber of fatals, and so forth.

Teasing out more detail about reported error severity could be a usefulexercise, but I'm not sure it would result in much more meaningfulsignals than we currently have about production health. Seriousproblems can manifest as trivial-seeming notices, some issues start outthat way and cascade over time, and generally any form of recurringlogspam needs human evaluation before we can easily say much more than"this is a problem".

Or do we have to triage each of them as we examine them?

Yeah. There are doubtless a lot of ways to improve the tooling we usefor that process, but right now I think it would be most helpful if wejust had more eyes _routinely_ on the logs and the workboard. (SeeTyler's earlier and much more detailed/thoughtful response to this thread.)


--
Brennen Bearnes
Release Engineering

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] 📈 Wikimedia production errors help

Reply via email to