leventov opened a new issue #7362: Principles for choosing logging severity URL: https://github.com/apache/incubator-druid/issues/7362 ### Motivation There are no guidelines for choosing logging severity (fatal, error, warning, info, debug, and trace) for developers. Logging is inconsistent in different classes. ### Proposed principles #### Assertion errors Currently, Druid's `Logger` class has `wtf` methods which make error-level statements with "WTF?! :" prefix. First of all, I suggest to rename it to `assertionError`. They have "ASSERTION_ERROR" prefix instead. These are for the situations when coding assumptions are broken, something that is considered impossible happens. We would otherwise throw `AssertionError` if Druid didn't want to be more resilient. Note: something like catching `IOException` from network or disk I/O methods shouldn't log on this level, because such methods can throw `IOException`. This should be reserved for situations that are considered truly impossible on "Java language" level. Note: sometimes, InterruptedExceptions should be logged on this level (rather than ERROR), when we truly don't except that anybody would interrupt our call. Action for Druid cluster operators: - Track "ASSERTION_ERROR" - Always file an issue to Druid if "ASSERTION_ERROR" appears in logs. #### Error - Failed query (due to insufficient resources, exceptions, etc.) - Failed network or disk I/O operation - Failed to load a segment on a node - Failed task - Timeouts - Usually, code like `catch (Exception e) {}` should log on this level. Note: in retry code, each failed our timed out attempt should log on this level, not the warning level. Action for Druid cluster operators: - Track "ERROR". - *Consider* filing an issue to Druid if "ERROR" (without "ASSERTION_ERROR" following) appears in logs. Not all such errors are problems with Druid, e. g. problem with network in the cluster may result in a spike in I/O action and query errors. #### Warning - Calling to an endpoint that was renamed, but the former name is still accepted for backward compatibility (e. g. see #7361) - Druid detects something that may be a sign of misconfiguration of a query, or a node, or the whole cluster (e. g. too much time spend in balancing segments) - Druid detects that some cluster resource is about to exhaust (e. g. capacity of a tier) Action for Druid cluster operators: - Track "WARNING". - Consider taking some action: call into new endpoint name, reconfigure the cluster, add historical nodes, etc. if "WARNING" appears in logs. If there are no bugs in Druid code, "WARNING" should never be something Druid cluster operators file an issue about directly. This is another criterion for choosing between the error and the warning levels for any logging statement. #### Info Some "interesting" information that may be helpful for introspection of Druid node or cluster operation, but doesn't qualify to be a warning. Action for Druid cluster operators: none, unless during an investigation of some bug or error condition. Note: information that is too mundane to be useful even for bug or error investigation should not be logged as info (and not logged at all, as proposed below) #### Debug, Trace These levels are not used in production code. Logging statements on these levels can be added during testing to inspect something (if debugging doesn't work or help) but should be removed before committing code. Rationale: if a logging statement doesn't deserve to be "info", it's net value by contaminating the code is negative. ### Future work Fixing all existing logging statements in the codebase is out of the scope of this proposal. It should be a gradual community effort after the principles are worked out. After we remove all debug and trace logging from production code (maybe making some of them "info" instead), we can add static analysis checks to prohibit adding new such statements in production code.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
