On 9/15/20 9:43 AM, Alex Ezell wrote:

Do we use levels for any of these error log outputs? That is, are they
classified on output as High, Medium, Low, Info, or something like that?

To an extent, yes. We have separate channels for PHP errors and exceptions, for example, and although I don't think we currently differentiate in logstash, maybe we could plausibly draw a further distinction between PHP error levels. Intuitively, a low number of PHP notices probably indicates something of lower severity than a high number of fatals, and so forth.

Teasing out more detail about reported error severity could be a useful exercise, but I'm not sure it would result in much more meaningful signals than we currently have about production health. Serious problems can manifest as trivial-seeming notices, some issues start out that way and cascade over time, and generally any form of recurring logspam needs human evaluation before we can easily say much more than "this is a problem".

Or do we have to triage each of them as we examine them?

Yeah. There are doubtless a lot of ways to improve the tooling we use for that process, but right now I think it would be most helpful if we just had more eyes _routinely_ on the logs and the workboard. (See Tyler's earlier and much more detailed/thoughtful response to this thread.)

--
Brennen Bearnes
Release Engineering

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to