People seem hung up on DEBUG here.  The goal of CASSANDRA-10241 was
to clean up the system.log so that it a very high “signal” in terms of what was 
logged
to it synchronously, but without reducing the ability of the logs to allow 
people to
solve problems and perform post mortem analysis of issues.  We have 
informational
log messages that are very useful to understanding the state of things, like 
compaction
status, repair status, flushing, or the state of gossip in the system that are 
very useful to
operators, but if they are all in the system.log make said log file harder to 
look over for
issues.  In 10241 the method chosen for how to keep these log messages around by
default, but get them out of the system.log was that these messages were 
changed from
INFO to DEBUG and the new debug.log was created.

From the discussion here it seems that many would like to change how this 
works.  Rather
than just turning off the debug.log I would propose that we switch to using the 
SLF4J
MARKER[1] ability to move the messages back to INFO but tag them as belonging to
the asynchronous_system.log rather than the normal system.log.

[1] https://logback.qos.ch/manual/layouts.html#marker 
<https://logback.qos.ch/manual/layouts.html#marker>
https://www.slf4j.org/faq.html#fatal <https://www.slf4j.org/faq.html#fatal>


> On Mar 19, 2018, at 9:01 AM, Stefan Podkowinski <s...@apache.org> wrote:
> 
> I'd agree that INFO should be the default. Turning on the DEBUG logging
> can cause notable performance issues and I would not enable it on
> production systems unless I really have to. That's why I created 
> CASSANDRA-12696 for 4.0, so you'll be able to at least only partially
> enable DEBUG based on what's relevant to look at, e.g. `nodetool
> setlogginglevel bootstrap DEBUG`.
> 
> But small improvements like that won't change the fact that log files
> suck in general for more complex analysis, except for trivial tailing
> and grepping. You have to make sure that logging is enabled and old
> records you're interested in will not be rotated out. Then you have to
> gather log files from individual nodes somehow. Eventually I end up with
> a local tarball with logs in that situation and the fun starts creating
> hacky, regex loaded Python scripts to parse them. As each log message is
> limited to a single line of text, it's often missing out relevant
> details. You also got to create different parsers for different messages
> of course. It's just inefficient and too time consuming to gather
> information that way. Let alone implementing more advanced monitoring
> solutions on top of that.
> 
> That's exactly why I started working on the "diagnostic events"
> (CASSANDRA-12944) idea more than a year ago. There's also support for
> persistency (CASSANDRA-13460) that would implement storing important but
> infrequent events as rich json objects in a local keyspace and allows
> retrieving them by using CQL. I still like the idea and think it's worth
> pursuing.
> 
> 
> On 19.03.18 09:53, Alain RODRIGUEZ wrote:
>> Hello,
>> 
>> I am not developing Cassandra, but I am using it actively and helping
>> people to work with it. My perspective might be missing some code
>> considerations and history as I did not go through the ticket where this
>> 'debug' level was added by default. But here is a feedback after upgrading
>> a few clusters to Cassandra 2.2:
>> 
>> When upgrading a cluster to Cassandra 2.2, 'disable the debug logs' is in
>> my runbook. I mean, very often, when some cluster is upgraded to Cassandra
>> 2.2 and has problems with performances, the 2 most frequent issues are:
>> 
>> - DEBUG level being turned on
>> - and / or dynamic snitching being enabled
>> 
>> This is especially true for high percentile (very clear on p99). Let's put
>> the dynamic snitch aside as it is not our topic here.
>> 
>> From an operational perspective, I prefer to set the debug level to 'DEBUG'
>> when I need it than having, out of the box, something that is unexpected
>> and impact performances. Plus the debug level can be changed without
>> restarting the node, through 'JMX' or even using 'nodetool' now.
>> 
>> Also in most cases, the 'INFO' level is good enough for me to detect most
>> of the issues. I was even able to recreate a detailed history of events for
>> a customer recently, 'INFO' logs are already very powerful and complete I
>> believe (nice work on this by the way). Then monitoring is helping a lot
>> too. I did not have to use debug logs for a long time. It might happen, but
>> I will find my way to enable them.
>> 
>> Even though it feels great to be able to help people with that easily
>> because the cause is often the same and turning off the logs is a
>> low hanging fruit in C*2.2 clusters that have very nice results and is easy
>> to achieve, I would prefer people not to fall into these performances traps
>> in the first place. In my head, 'Debug' logs should be for debug purposes
>> (by opposition to 'always on'). It seems legit. I am surprised this brings
>> so many discussions I thought this was a common standard widely accepted,
>> and beyond Cassandra. That being said, it is good to see those exchanges
>> are happening, so the decision that will be taken will be a good one, I am
>> sure. I hope this comment will help, I have no other goal, for sure I am
>> not willing to feed a conflict but a talk and I hope no one felt offended
>> by this feedback. I believe this change was made aiming at
>> helping/improving things, but it turns out it is more of an annoyance than
>> truly helpful (my personal perspective).
>> 
>> I would +1 on making 'INFO' default again, but if someone is missing
>> information that should be in 'INFO'. If some informations are missing at
>> the 'INFO' level, why not add informations that should be at the 'INFO'
>> level there directly and keep log levels meaningful? Making sure we do not
>> bring the logs degrading performances from 'Debug' to 'Info' as much as we
>> can.
>> 
>> Hope this is useful,
>> 
>> C*heers,
>> 
>> -----------------------
>> Alain Rodriguez - @arodream - al...@thelastpickle.com
>> France / Spain
>> 
>> The Last Pickle - Apache Cassandra Consulting
>> http://www.thelastpickle.com
>> 
>> 2018-03-19 2:18 GMT+00:00 kurt greaves <k...@instaclustr.com>:
>> 
>>> On the same page as Michael here. We disable debug logs in production due
>>> to the performance impact. Personally I think if debug logging is necessary
>>> for users to use the software we're doing something wrong. Also in my
>>> experience, if something doesn't reproduce it will not get fixed. Debug
>>> logging helps, but I've never seen a case where an actual bug simply
>>> *doesn't* reproduce eventually, and I'm sure if this issue appears debug
>>> logging could be enabled after the fact for the relevant classes and
>>> eventually it will reoccur and we could solve the problem. I've never seen
>>> a user say no to helping debug a problem with patched jars/changes to their
>>> system (like logging). I'd much prefer we pushed that harder rather than
>>> just saying "Everyone gets debug logging!". I'm also really not sold that
>>> debug logging saves the day. To me it mostly just speeds up the
>>> identification process, it generally doesn't expose magical information
>>> that wasn't available before, you just had to think about it a bit more.
>>> 
>>> 
>>> In a way the real issue might be that we don’t have nightly performance
>>>> runs that would make an accidentally introduced debug statement obvious.
>>> This is an issue, but I don't think it's the *real* issue. As already
>>> noted, debug logging is for debugging, which normal users shouldn't have to
>>> think about when they are just operating the software. We shouldn't risk
>>> performance regressions just for developer convenience.
>>> 
>>> On 19 March 2018 at 00:55, Ariel Weisberg <adwei...@fastmail.fm> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> In a way the real issue might be that we don’t have nightly performance
>>>> runs that would make an accidentally introduced debug statement obvious.
>>>> 
>>>> A log statement that runs once or more per read or write should be easy
>>> to
>>>> spot. I haven’t measured the impact though. And as a bonus by having this
>>>> we can spot a variety of performance issues introduced by all kinds of
>>>> changes.
>>>> 
>>>> Ariel
>>>> 
>>>>> On Mar 18, 2018, at 3:46 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>>>>> 
>>>>> In Cassandra-10241 I said I was torn on this whole ticket, since most
>>>> people would end up turning it off if it had a negative impact. You said:
>>>>> “I'd like to emphasize that we're not talking about turning debug or
>>>> trace on for client-generated request paths. There's way too much data
>>>> generated and it's unlikely to be useful.
>>>>> What we're proposing is enabling debug logging ONLY for cluster state
>>>> changes like gossip and schema, and infrequent activities like repair. “
>>>>> Clearly there’s a disconnect here - we’ve turned debug logging on for
>>>> everything and shuffled some stuff to trace, which is a one time action
>>> but
>>>> is hard to protect against regression. In fact, just looking at the read
>>>> callback shows two instances of debug log in the client request path
>>>> (exercise for the reader to “git blame”).
>>>>> Either we can go clean up all the surprises that leaked through, or we
>>>> can turn off debug and start backing out some of the changes in 10241.
>>>> Putting stuff like compaction in the same bucket as digest mismatch and
>>>> gossip state doesn’t make life materially better for most people.
>>>>> --
>>>>> Jeff Jirsa
>>>>> 
>>>>> 
>>>>>> On Mar 18, 2018, at 11:21 AM, Jonathan Ellis <jbel...@gmail.com>
>>> wrote:
>>>>>> That really depends on whether you're judicious in deciding what to
>>> log
>>>> at
>>>>>> debug, doesn't it?
>>>>>> 
>>>>>> On Sun, Mar 18, 2018 at 12:57 PM, Michael Kjellman <
>>> kjell...@apple.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> +1. this is how it works.
>>>>>>> 
>>>>>>> your computer doesn’t run at debug logging by default. your phone
>>>> doesn’t
>>>>>>> either. neither does your smart tv. your database can’t be running at
>>>> debug
>>>>>>> just because it makes our lives as engineers easier.
>>>>>>> 
>>>>>>>> On Mar 18, 2018, at 5:14 AM, Alexander Dejanovski <
>>>>>>> a...@thelastpickle.com> wrote:
>>>>>>>> It's a tiny bit unusual to turn on debug logging for all users by
>>>> default
>>>>>>>> though, and there should be occasions to turn it on when facing
>>> issues
>>>>>>> that
>>>>>>>> you want to debug (if they can be easily reproduced).
>>>>>> 
>>>>>> --
>>>>>> Jonathan Ellis
>>>>>> co-founder, http://www.datastax.com
>>>>>> @spyced
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>> 
>>>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

Reply via email to