Re: [Dspam-devel] Improve logging

Tom Hendrikx Thu, 25 Aug 2011 07:29:21 -0700

On 25/08/11 15:31, Stevan Bajić wrote:
>  On Thu, 25 Aug 2011 00:54:39 +0200, Tom Hendrikx wrote:
>> On 24/08/11 23:40, Stevan Bajic wrote:
>>> On 24.08.2011 21:27, Tom Hendrikx wrote:
>>>> Hello,
>>> Hello,
>>>
>>>> I'm currently looking into possibilities to improve regular logging
>>>> output for DSPAM. Currently, DSPAM does not log anything when 
>>>> running
>>>> normally. There is no easy way to see regular activity within a 
>>>> DSPAM
>>>> process. There are ways around this such as enabling TrackSources 
>>>> which
>>>> logs 1 line for each processed message (under most circumstances), 
>>>> but
>>>> this is far from ideal.
>>>>
>>>> The idea is to make DSPAM output some informal logging about its 
>>>> own
>>>> activity. To do this, my general idea is like this:
>>>>
>>>> - Add LOG(LOG_INFO, "informal message") to applicable places in the 
>>>> code
>>>>
>>>> - Add the actual strings to language.h in the same way current 
>>>> logging
>>>> (90% errors) is done.
>>>>
>>>> - Convert some of the existing debug messages to LOG(LOG_DEBUG...
>>>> statements and enable them even when --debug is disabled
>>> this will result in bigger binaries and more memory needed to run 
>>> DSPAM.
>>> Currently DSPAM does not compile some code if --debug is disabled.
>>
>> This is meant for debugging output that is not inside ifdef blocks, 
>> also
>> see the part below about --debug etc.
>>
>> Some (not all) of the current LOGDEBUG output is also valid for 
>> regular
>> end user debugging, so we could just convert them to LOG(LOG_DEBUG, 
>> ...
>> in stead of duplicating them.
>>
>  I was not expecting to double the code/output. All I was saying is that 
>  a lot of debug code right now is NOT compiled into the binary if you DO 
>  NOT enable --debug. So without --debug = less code. Your proposal will 
>  remove that benefit and add additional code to the binary. And since you 
>  propose to add different logging levels, each of those levels will add 
>  more code to the binary.
>


I think you misunderstood me. There are currently 2 ways to raise verbosity:

- set --enable-debug/--enable-bnr-debug/--enable-verbose-debug at
*compile time*: this enables extra code to be included in the binary. I
do not want to change this.
- set --debug at *runtime*: this makes LOGDEBUG calls in the code
actually produce output.

However there are also calls to LOGDEBUG that are available at runtime
even when ---enable-debug etc was not set at compile time. (for an
example see dspam.c, around line 265). Converting these calls to
LOG(LOG_DEBUG, ..) will not enable extra code in the binary, as it is
already there and always available (only the call to LOGDEBUG is a
no-op). Thus, this will only create more code *execution*, not more code
*generation*.

My proposal would largely obsolete the use of --debug at runtime, and
replace it with a change of the LogLevel directive in dspam,conf. We
could easily update handling of --debug at runtime to set LogLevel to
'debug', which will yield the same result.

The purpose and effect of enabling debugging at compile time is not
being changed.

> 
>>>
>>>> - Add a directive to dspam.conf that enables the user to change
>>>> verbosity: LogLevel {debug,info,notice,warn,critical,alert,emerg} 
>>>> and
>>>> log all messages above and including this level to the regular 
>>>> logging
>>>> channel (syslog or ./configure --with-logfile=<path>)
>>>>
>>>>
>>>> The general logging output should pass the following requirements:
>>>>
>>>> - Each (interesting) event should ad one (and only one) line to the 
>>>> log
>>>>
>>>> - A log entry should be short but informative, and contain as much
>>>> relevant information as possible.
>>>>
>>>> - LogLevel set to "info" should result in output that enables an
>>>> administrator (i.e. no developer) to see high level message 
>>>> handling on
>>>> a per-message basis.
>>>>
>>>> - LogLevel set above "info" should only output issues, as is 
>>>> currently
>>>> done (albeit because no messages of level "info" or "lower" are
>>>> available in the code).
>>>>
>>>> - LogLevel set to "debug" should add output that helps 
>>>> administrators to
>>>> find configuration issues.
>>>>
>>>> - Output that is only helpful for developers or significantly 
>>>> impacts
>>>> performance (because extra calculations/queries/etc are needed) 
>>>> should
>>>> be logged only when
>>>> --enable-debug/--enable-bnr-debug/--enable-verbose-debug options 
>>>> are
>>>> enabled at compile time. This is currently already being done.
>>>>
>>>> Other things to consider:
>>>>
>>>> We could write all the debug output to the regular logging channel,
>>>> thereby eliminating the current (IMHO annoying) practice of logging 
>>>> to
>>>> different files.
>>> What different files? Do you mean the SQL logging and the message
>>> content logging?
>>
>> Id like to eliminate the dspam.debug, bnr.log and sql.errors (which 
>> are
>> mostly not errors but queries) files as written to the logdir.
>>
>  sql.errors is usually used for SQL errors. If you enable debugging then 
>  the sql.errors is misused as a query log. dspam.debug is really just 
>  debug and bnr.log is a very verbose log for BNR processing.
> 
> 
>> NB The files used for the webui (system.log and all other files in 
>> dspam
>> homedir) are not conventional logging and serve no actual logging
>> purpose: their intention is to provide data to the webui. They are 
>> not
>> subject to this conversation.
>>
>>>
>>>> Current error messages can be enhanced when possible. Depends on 
>>>> the
>>>> case, but adding debug/warning messages deeper inside the code 
>>>> should
>>>> reveal more information when someone is interested.
>>>>
>>>> If needed we could add some unique identifier for each message 
>>>> passing
>>>> though DSPAM, so log lines for each messages can be grouped 
>>>> together
>>>> easily (MTA queue id alike).
>>> Currently DSPAM does not have such code/id because it does not have 
>>> a
>>> queue management.
>>
>> We could generate a semi-unique id for each message, not for queue
>> management, but for log file readability.
>>
>  Yeah. Without such a number and with the need to log more then one line 
>  per message, we would need such a id in order to concatenate log entries 
>  belonging to one message.
> 
> 
>> I have a low-traffic site so this would be not needed for me 
>> personally,
>> but i'd expect this to come in handy for larger sites that process
>> multiple messages at the same time. Sounds like good practice to me.
>>
>>>
>>>> Example
>>>> -------
>>>> We are running dspam as daemon and a message is processed. Loglevel 
>>>> is
>>>> set to "info". This means that a line is logged for each of the 
>>>> events:
>>>>
>>>> - external entity connects to dspam via LMTP or dspamc
>>>> - dspam classifies message, and decides what the result is
>>>> - dspam handles message according to result (quarantine/send over 
>>>> smtp/etc/)
>>>>
>>>> Thus, normal execution of a message results in 3 log lines. This 
>>>> seems
>>>> reasonable to be, since other applications (f.i. postfix) are able 
>>>> to
>>>> produce much more logging without harming performance.
>>> Without harming performance? No way! Each logging is by definition
>>> harming performance.
>>
>> Correction: harming performance not too much. Considering the amount 
>> of
>> work that DSPAM does, pushing a string towards the syslog socket 
>> (which
>> it is already connected to) is only minor overhead, I think. Anyway,
>> this is tunable when performance issues might arise.
>>
>  Sending those little strings to syslog socket sounds like nothing but 
>  such little things can degenerate performance. I know it sounds 
>  ridiculous but any output is harming performance.
> 
> 
>> I'll ask around tomorrow if any performance is gained (or needed) by
>> limiting log output on some high-traffic machines with talky software
>> (postfix is very talky).

I chatted a bit with my company sysadmins, but they never felt the need
to tune logging verbosity for chatty services such as postfix. Note that
postfix logs at least 5 lines for each accepted message.

>>
>> For good measure: I believe in "Premature optimization is the root of
>> all evil" :)
>>
>  Well... no optimization has been done. It just was never there.
>  btw: is not the opposite gender the root of all evil? ;)

Only during that special time of the month :P

> 
>>>
>>>> As I'm sure I'm not the only person that is annoyed by the lack of
>>>> regular processing output in DSPAM, I'd like to hear your ideas and
>>>> comments on the above.
>>> I am not against additional logging. The question that I have is: 
>>> Why
>>> the additional logging? What is the purpose of it? What goal do you 
>>> try
>>> to reach with more logging?
>>
>> Ever since the first time I setup DSPAM this was a pita. Various 
>> things
>> come to mind:
>>
>> - When setting up DSPAM, you are looking for issues in the logfile or
>> stdout while configuring. Having no output at all is confusing, as
>> no-one is telling that you are doing a good job.
>>
>  This is configuration thing. Nothing that will be globally solved with 
>  more logging on a per message basis.

There is no way to configure dspam to tell you that configuration is
working as intended. It only tells you when something is wrong. SO no
configuration issue.

> 
>> - When setting up DSPAM (and you are doing it right), a message might
>> get lost somewhere in your MTA->DSPAM->MTA->IMAP pipeline (insert 
>> more
>> hops when needed). It's nice that you can easily determine that DSPAM 
>> is
>> working as intended, so you can concentrate on somewhere else. 
>> However,
>> DSPAM does nothing to inform you of correct processing.
>>
>  It does mention stuff in system.log and/or can do some logging with 
>  TracSources. Anyway.... to take the example from above:
>  -> MTA -> DSPAM -> MTA -> WHATEVER

TrackSources doesn't work on a new setup, it only starts emitting data
after 1.5k messages. Also, Tracksources is no real logging: it has no
actual information about the work DSPAM has done.

I only abuse it because it is the only way to get some per-message output.

> 
>  When the message leaves your MTA then the MTA is logging this and when 
>  the message is reinjected back from DSPAM to the MTA then the MTA is 
>  logging that too. Right? So you know that DSPAM is working.
> 

This only applies when your setup is working correctly. When something
is wrong, it is really easy when you are able to verify for each step
works as intended. DSPAM does not tell you that it is working, you can
only see that because the message comes back on the next hop.

> 
>> - The 3 lines a week
>>
>  A week? You mean per message?
> 

By default, only startup messges are logged. If I don't restart DSPAM,
nothing gets logged. You need to enable TrackSources (which is no real
logging) to get some per-message output.

> 
>> that are currently logged on my setup, could also
>> be passed to stderr, and logging infrastructure within DSPAM would no
>> longer be needed at all. DSPAM wants syslog or logfile specified at
>> compile time, so you expect it be used, but this not really done. 
>> This
>> is counterintuitive.
>>
>> - More information == good.
>>
>  I am to long doing IT to know that this is not true. More information 
>  is not imperatively good.
>  The RIGHT information at the RIGHT time == good.
>  All other (more) information is usually useless.
>  

By default, nothing is logged. By misusing Tracksources, you can
convince it to log a single line without much information, but only for
regular processing, not for retraining etc. To me this is not enough
information.

Anyway, anyone who is not interested in logging can set verbosity to
critical, so send log messages to /dev/null. But in the current
situation, there is no choice.

> 
>> People can add logparsers to their
>> monitoring setup, and don't need to rely on half-fabricates such as
>> system.log, or the output of dspam_stats etc. I wrote working munin
>> plugins around both of them, but log parsing is more flexible, much
>> easier and more fun.
>>
>> - You are saying 'additional logging' and 'more logging', but by 
>> default
>> there is no logging at all: a non-debug setup only logs startup and
>> shutdown messages. As an administrator, I value informative logging 
>> just
>> as much as a working setup: it helps you to find errors and 
>> anomalies,
>> and tells you what is happening.
>>
>  Off course you are right.
> 
> 
>> - No informative logging output feels awkward: it seems as if DSPAM 
>> does
>> not really know what it is doing it self,
>>
>  Just because DSPAM does not do that much logging does not mean that it 
>  self does not know what it is doing.

Yes, but explain that to an administrator that is new to DSPAM, has a
hard time setting it up and has not yet had any positive experiences
with it.

> 
> 
>> since it won't tell about it.
>> (this is subjective and no hard fact, but still true)
>>
>  So a calm person is in your viewpoint a person not knowing what it does 
>  it self? LOL :)
> 

It is more that there is no way that you can ask DSPAM how it is doing,
except by setting his head on fire (recompile with debugging enabled) :P

------------------------------------------------------------------------------
EMC VNX: the world's simplest storage, starting under $10K
The only unified storage solution that offers unified management 
Up to 160% more powerful than alternatives and 25% more efficient. 
Guaranteed. http://p.sf.net/sfu/emc-vnx-dev2dev
_______________________________________________
Dspam-devel mailing list
Dspam-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-devel

Re: [Dspam-devel] Improve logging

Reply via email to