As a part of code review for PRs, I think it's time we should start giving the 
logging calls some scrutiny.


Most of the logging calls are just leftover debug print statements disguised as 
logging, from back when we had no debug/trace capability.


We should be removing the logging calls that make no sense as actual logging. I 
haven't turned on logging for any reason for years. There are CLI tests that 
turn it on, and to some degree just display how not-useful it is.


While they are largely just clutter now, some of them do contain, by their 
format strings, useful code documentation that should be preserved as a comment.


This of course begs the question of what logging is for. I.e., when to use it. 
I think code reviews should suggest places to add logging, places where logging 
calls that are spurious should be removed/converted-to-comments.


The times not to use logging - not as a debug print statement,  not as an 
invariant or precondition check. Not as a replacement for tracer/debugger 
designed for end-users to use.  Not as part of test facilities for the code 
base. E.g.,  I think the TDML runner should not (and does not) use logging.


Logging calls are to assist with understanding what is going on in the software 
when it is deployed. When a system using daffodil is not doing what is 
intended/desired, people will want to see some evidence/log of what daffodil is 
doing so as to isolate the problem to either something in Daffodil, in their 
DFDL schema, or something in another part of the system.


Logging examples:

  *   In schema compiler - core optimizations that happen/don't happen are 
worth logging - e.g., if a schema is "not scannable" meaning it is a mix of 
text and binary, or mixture of different character set encodings - that's worth 
logging, as it's a core top level thing that if expected and not found, 
indicates a significant issue in the schema.
  *   Successful/failed schema compilation.
  *   Schema definition warnings.
  *   Successful save/restore of processor.
  *   Every top level parse/unparse call - logging of its success or failure 
and location in the data stream (for messaging API)
  *   Every top level parse/unparse call's validation errors (not each 
individual validation error, but all of them at the top level)
  *   Exceptional situations like uncaught throws, aborts/invariant failures. 
These should abort, but a real application has to recover from such failures, 
so they need to be logged.

There are undoubtedly many others, but the basic observation is that these are 
fairly coarse grained. Individual parse/unparse actions like scanning for a 
delimiter aren't things that we should be logging.

Thoughts?

Reply via email to