[jira] [Commented] (SOLR-11934) Visit Solr logging, it's too noisy.

Erick Erickson (JIRA) Fri, 04 May 2018 19:32:50 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-11934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464575#comment-16464575
 ]


Erick Erickson commented on SOLR-11934:
---------------------------------------

I looked at sample logs from a client and wrote a little program to bucket 
them. NOTE: this was an index-heavy type application. Here are the number of 
log messages originating in these files:

        35 occurrences in 
org.apache.solr.common.cloud.ZkStateReader$StateWatcher;
        48 occurrences in org.apache.solr.core.QuerySenderListener;
        55 occurrences in org.apache.solr.servlet.HttpSolrCall;
     64388 occurrences in org.apache.solr.update.SolrIndexWriter;
     64413 occurrences in org.apache.solr.search.SolrIndexSearcher;
     64601 occurrences in org.apache.solr.core.SolrCore;
    128795 occurrences in org.apache.solr.update.DirectUpdateHandler2;
   3345415 occurrences in 
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor;

that got me to thinking about the _value_ of certain messages when 
troubleshooting. Is the value in the individual messages or in getting a sense 
of how things are behaving?

So WDYT about logging some summary information instead of individual messages 
for high-cardinality messages? Rather than logging every update at info level, 
would it be "useful enough" when troubleshooting to log a summary of updates' 
vital statistics every N seconds?

Here's what I'm thinking. Pretty often when troubleshooting, we want to 
correlate heavy request activity for Solr to something else (GC, slowdowns, 
recoveries, whatever). Using updates as an example, what if at info level, we 
logged something like

INFO  - 2018-05-04 03:35:09.206 over the last T seconds there were N update 
requests updating D documents taking min/avg/median/max milliseconds to 
complete with min/average/median/max documents per request.

With the new metrics capabilities, that information may be available already, 
along with lots of other information (percentiles and the like).

If seeing the underlying individual requests were important, one could turn on 
finer-grained logging for this class only. We could even supply commented-out 
ways to enable voluminous messages for anything we choose to write summary 
information out in the log4j2 conifgs.

I totally understand that for gnarly problems you want (and need) voluminous 
output. I totally understand that sometimes you can't predict that up front. I 
totally understand that in some cases this would introduce an extra round-trip, 
"you have to enable this option in your logging configuration, wait for it to 
happen again and call me".

I also totally understand that in large Solr operations, the log files are 
very, very hard to use because there's so much information being dumped by 
potentially hundreds of Solr instances.

I also totally understand that I've spend more time than I like to remember 
writing programs to parse log files to give me summary information like the 
above to know where to start looking for problems. Why not have Solr do it?

This example is just for updates, but queries could do something very similar 
along with whatever else fits this pattern. Queries certainly come to mind.

The question I'm really asking is if we're logging intelligently or not. 
Basically we're throwing everything that anyone _might_ be interested into a 
log file and then requiring the users (and consultants) to figure out how to 
make it useful. We know enough about how this information is used I think to do 
some of it up front, always with the option to drop back to the finer-grained 
information when necessary.

Anyway, what do people think?

> Visit Solr logging, it's too noisy.
> -----------------------------------
>
>                 Key: SOLR-11934
>                 URL: https://issues.apache.org/jira/browse/SOLR-11934
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>            Priority: Major
>
> I think we have way too much INFO level logging. Or, perhaps more correctly, 
> Solr logging needs to be examined and messages logged at an appropriate level.
> We log every update at an INFO level for instance. But I think we log LIR at 
> INFO as well. As a sysadmin I don't care to have my logs polluted with a 
> message for every update, but if I'm trying to keep my system healthy I want 
> to see LIR messages and try to understand why.
> Plus, in large installations logging at INFO level is creating a _LOT_ of 
> files.
> What I want to discuss on this JIRA is
> 1> What kinds of messages do we want log at WARN, INFO, DEBUG, and TRACE 
> levels?
> 2> Who's the audience at each level? For a running system that's functioning, 
> sysops folks would really like WARN messages that mean something need 
> attention for instance. If I'm troubleshooting should I turn on INFO? DEBUG? 
> TRACE?
> So let's say we get some kind of agreement as to the above. Then I propose 
> three things
> 1> Someone (and probably me but all help gratefully accepted) needs to go 
> through our logging and assign appropriate levels. This will take quite a 
> while, I intend to work on it in small chunks.
> 2> Actually answer whether unnecessary objects are created when something 
> like log.info("whatever {}", someObjectOrMethodCall); is invoked. Is this 
> independent on the logging implementation used? The SLF4J and log4j seem a 
> bit contradictory.
> 3> Maybe regularize log, logger, LOG as variable names, but that's a nit.
> As a tactical approach, I suggest we tag each LoggerFactory.getLogger in 
> files we work on with //SOLR-(whatever number is assigned when I create 
> this). We can remove them all later, but since I expect to approach this 
> piecemeal it'd be nice to keep track of which files have been done already.
> Finally, I really really really don't want to do this all at once. There are 
> 5-6 thousand log messages. Even at 1,000 a week that's 6 weeks, even starting 
> now it would probably span the 7.3 release.
> This will probably be an umbrella issue so we can keep all the commits 
> straight and people can volunteer to "fix the files in core" as a separate 
> piece of work (hint).
> There are several existing JIRAs about logging in general, let's link them in 
> here as well.
> Let the discussion begin!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-11934) Visit Solr logging, it's too noisy.

Reply via email to