Sylvain Lebresne created CASSANDRA-4861:
-------------------------------------------

             Summary: Consider separating tracing from log4j
                 Key: CASSANDRA-4861
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4861
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Sylvain Lebresne
             Fix For: 1.2.0 beta 2


Currently, (as far as I understand) tracing is implemented as a log4j appender 
that "intercepts" all log messages and write them to a system table. I'm sorry 
to not have bring that up during the initial review (it's hard to follow every 
ticket) but before we release this I'd like to have a serious discussion on 
that choice because I'm not convinced (at all) that it's a good idea. Namely, I 
can see the following drawbacks:
# the main one is that this *forces* every debug messages to be traced and 
conversely, every traced message to be logged at debug. But I strongly think 
that debug logging and query tracing are not the same thing. Don't get me 
wrong, there is clearly a large intersection between those two things (which is 
fine), but I do think that *identifying* them is a mistake. More concretely:
 ** Consider some of the messages we log at debug in CFS:
   {noformat}
   logger.debug("memtable is already frozen; another thread must be flushing 
it");
   logger.debug("forceFlush requested but everything is clean in {}", 
columnFamily);
   logger.debug("Checking for sstables overlapping {}", sstables);
   {noformat}
   Those messages are useful for debugging and have a place in the log at 
debug, but they are noise as far as query tracing is concerned (None have any 
concrete impact on query performance, they just describe what the code has 
done). Or take the following ones from CompactionManager:
   {noformat}
   logger.debug("Background compaction is still running for {}.{} ({} 
remaining). Skipping", new Object[] {cfs.table.name, cfs.columnFamily, count});
   logger.debug("Scheduling a background task check for {}.{} with {}", new 
Object[] {cfs.table.name, cfs.columnFamily, 
cfs.getCompactionStrategy().getClass().getSimpleName()});
   logger.debug("Checking {}.{}", cfs.table.name, cfs.columnFamily);
   logger.debug("Aborting compaction for dropped CF");
   logger.debug("No tasks available");
   logger.debug("Expected bloom filter size : " + expectedBloomFilterSize);
   logger.debug("Cache flushing was already in progress: skipping {}", 
writer.getCompactionInfo());
   {noformat}
   It is useful to have that in the debug log, but how is any of that useful to 
users in query tracing? (it may be useful to trace if a new compaction start or 
stop, because that does influence query performance, but those message do not). 
Also take the following message logged when a compaction is user
   interrupted:
   {noformat}
   if (t instanceof CompactionInterruptedException)
   {
       logger.info(t.getMessage());
       logger.debug("Full interruption stack trace:", t);
   }
   {noformat}
   I can buy that you may want the first log message in the query tracing, but 
the second one is definitively something that only make sense for debug logging 
but not for query tracing (and as a side note, the current implementation don't 
do something sensible as it traces "Full interruption stack trace:" but 
completely ignore the throwable).
   Lastly, and though that's arguably more a detail (but why would we settle 
for something good enough if we can do better) I believe that in some cases you 
want an event to be both logged at debug and traced but having different 
messages could make sense. For instance, in CFS we have
   {noformat}
   logger.debug("Snapshot for " + table + " keyspace data file " + 
ssTable.getFilename() + " created in " + snapshotDirectory);
   {noformat}
   I'm not convinced that snapshot should be part of query tracing given it 
doesn't really have an impact on queries, but even if we do trace it, we 
probably don't care about having one event for each snapshoted file (2 events, 
one for the start of the snapshot, one for the end would be enough).
   As it stands, I think query tracing will have a lot of random noises, which 
will not only be annoying but I'm also sure will make users spend time worrying 
about events that have no impact whatsoever. And I've only looked at the debug 
message of 2 classes ...
  ** I also think there could be case where we would want to trace something, 
but not have it in the debug log. For instance, it makes sense in the query 
trace to know how long parsing the query took. But logging too much info per 
query like that will make the debug log unmanageable in many case. And if you 
say, let's log that at TRACE, you have to put the TracingAppender at trace and 
now you get all the junk (junk as far as query tracing is concerned) that trace 
logging have.
# I find it rather unintuitive. Query tracing is enable per query and it writes 
its trace in a system table. How come changing some settings in the log4j 
config file can disable that feature for me? I agree it's not a big deal, but 
it does is some form of leaking an implementation detail.
# It doesn't seem very future-proof. For instance (and that's only an example), 
I think it could make sense to later add a tracing level. I might want a very 
detailed tracing mode where I get very fine grained details like what sstable 
was hit, and how many seeks on that sstable we did and whatnot. But as said 
above, using the log4j TRACE level for that is not convenient because it logs 
lots of stuff that are completely unrelated to my query.

And the only advantages of using log4j that I can see is that it that we don't 
have to go through all our debug statements to check which ones make sense to 
also add to query traces. But as lengthly explained above, that's not a real 
advantage as it end up generating trace that are less useful/user friendly as 
they could be.

Now maybe there is killer advantages that I don't see, and the goal of this 
ticket is to discuss those. But if there isn't, I'm very much in favor of 
moving from log4j for this.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to