[jira] [Commented] (CASSANDRA-7392) Abort in-progress queries that time out

Ariel Weisberg (JIRA) Fri, 02 Oct 2015 15:33:13 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941900#comment-14941900
 ]


Ariel Weisberg commented on CASSANDRA-7392:
-------------------------------------------

bq. I've added number of operations and interval and made the two messages 
partially identical, is this what you meant by "sync"? 
Bear in mind that the no spam logger will only log once every 15 minutes 
however.
So the weird thing with using NoSpamLogger is that in the info system.log it 
will report the # of timed out queries for the last N milliseconds, but that 
won't be the same as the number of queries since it last logged since.

As a logged metric it is a weird way to sample. Every N seconds you log what 
occurred in the last P milliseconds.

The other thing is that the log message says check the debug log, but where in 
the debug log? If they don't occur at the same timestamp it makes it a little 
less obvious although it's not critical. For instance what text would they 
search for to find it if they aren't already familiar with the log output.

It kind of seems like they should both log at the same frequency.  I think 
NoSpamLogger is not properly supporting the idiom we want here which is to 
check whether it is time to log, and then do the work of formatting and 
resetting the counters and draining the queue etc. if it is. And whether you 
want to get crazy and invert control and just pass NoSpamLogger a closure and 
an SES to completely formalize the idiom. Maybe at the same time also formalize 
the high detail/low detail aspect of how we are logging at two different levels.

[numDroppedOperations needs to be an 
AtomicLong.|https://github.com/apache/cassandra/commit/97d53cc56b75ec64ee25571202213bb35d36ecc2#diff-e06002c30313f8ead63ee472617d1b10R114]

[TODO document 
me|https://github.com/apache/cassandra/compare/trunk...stef1927:7392#diff-de1feec9efb986c341fd529741c30e3eR30]

[If you log at info isn't it going to end up in system.log and 
debug.log?|https://github.com/apache/cassandra/commit/97d53cc56b75ec64ee25571202213bb35d36ecc2#diff-e06002c30313f8ead63ee472617d1b10R160]

bq.No, we still need it when adding a timeout to the same failed operation.
Isn't it redundant with totalTime? You could just add totalTime instead of 
retaining failedAt.

[Here is what I was thinking of loop idiom wise. This puts all the control flow 
together at the top of the 
loop.|https://github.com/apache/cassandra/commit/1077ec7775fc61610e70d4fc67137c58f09f13a1]


> Abort in-progress queries that time out
> ---------------------------------------
>
>                 Key: CASSANDRA-7392
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7392
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Stefania
>            Priority: Critical
>             Fix For: 3.x
>
>
> Currently we drop queries that time out before we get to them (because node 
> is overloaded) but not queries that time out while being processed.  
> (Particularly common for index queries on data that shouldn't be indexed.)  
> Adding the latter and logging when we have to interrupt one gets us a poor 
> man's "slow query log" for free.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7392) Abort in-progress queries that time out

Reply via email to