[
https://issues.apache.org/jira/browse/CASSANDRA-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941900#comment-14941900
]
Ariel Weisberg commented on CASSANDRA-7392:
-------------------------------------------
bq. I've added number of operations and interval and made the two messages
partially identical, is this what you meant by "sync"?
Bear in mind that the no spam logger will only log once every 15 minutes
however.
So the weird thing with using NoSpamLogger is that in the info system.log it
will report the # of timed out queries for the last N milliseconds, but that
won't be the same as the number of queries since it last logged since.
As a logged metric it is a weird way to sample. Every N seconds you log what
occurred in the last P milliseconds.
The other thing is that the log message says check the debug log, but where in
the debug log? If they don't occur at the same timestamp it makes it a little
less obvious although it's not critical. For instance what text would they
search for to find it if they aren't already familiar with the log output.
It kind of seems like they should both log at the same frequency. I think
NoSpamLogger is not properly supporting the idiom we want here which is to
check whether it is time to log, and then do the work of formatting and
resetting the counters and draining the queue etc. if it is. And whether you
want to get crazy and invert control and just pass NoSpamLogger a closure and
an SES to completely formalize the idiom. Maybe at the same time also formalize
the high detail/low detail aspect of how we are logging at two different levels.
[numDroppedOperations needs to be an
AtomicLong.|https://github.com/apache/cassandra/commit/97d53cc56b75ec64ee25571202213bb35d36ecc2#diff-e06002c30313f8ead63ee472617d1b10R114]
[TODO document
me|https://github.com/apache/cassandra/compare/trunk...stef1927:7392#diff-de1feec9efb986c341fd529741c30e3eR30]
[If you log at info isn't it going to end up in system.log and
debug.log?|https://github.com/apache/cassandra/commit/97d53cc56b75ec64ee25571202213bb35d36ecc2#diff-e06002c30313f8ead63ee472617d1b10R160]
bq.No, we still need it when adding a timeout to the same failed operation.
Isn't it redundant with totalTime? You could just add totalTime instead of
retaining failedAt.
[Here is what I was thinking of loop idiom wise. This puts all the control flow
together at the top of the
loop.|https://github.com/apache/cassandra/commit/1077ec7775fc61610e70d4fc67137c58f09f13a1]
> Abort in-progress queries that time out
> ---------------------------------------
>
> Key: CASSANDRA-7392
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7392
> Project: Cassandra
> Issue Type: New Feature
> Components: Core
> Reporter: Jonathan Ellis
> Assignee: Stefania
> Priority: Critical
> Fix For: 3.x
>
>
> Currently we drop queries that time out before we get to them (because node
> is overloaded) but not queries that time out while being processed.
> (Particularly common for index queries on data that shouldn't be indexed.)
> Adding the latter and logging when we have to interrupt one gets us a poor
> man's "slow query log" for free.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)