[jira] [Comment Edited] (CASSANDRA-7392) Abort in-progress queries that time out

Ariel Weisberg (JIRA) Mon, 27 Jul 2015 11:54:50 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643077#comment-14643077
 ]


Ariel Weisberg edited comment on CASSANDRA-7392 at 7/27/15 6:54 PM:
--------------------------------------------------------------------

-Not done reviewing. Just wanted to drop a thought on design/performance.- 
Looks awesome. I aborted a read query the other day with kill -9.

Right now it schedules a monitoring task for every operation. This has a cost 
linear to the number of operations and adds a contended queue with a not great 
queue implementation (priority queue). My wild guess is that there may be a 
workload we care about that will highlight this. If we do our jobs it should 
bottleneck eventually as it is shared state accessed for every single operation.

What I would suggest is inverting the process. Have a thread wake up 
periodically and walk the list of in progress operations looking for ones that 
need to be timed out. You can store the in progress operation in a thread 
local. When the thread local is created insert the value container into a COW 
array list that the monitoring thread can iterate. That way there is no 
mutation of shared state and it only enters shared state periodically when the 
the monitoring thread walks the list.

This alternative approach has a cost linear to the number of threads 
dispatching tasks and almost no coherence cost.

Just to validate my understanding, when a query is aborted it either sends no 
response or sends an error response? I just want to be sure that it doesn't 
silently claim there is no more data.

The logging in OpMonitor isn't rate limited or aggregating. One super fancy 
thing to do would be to count all the timed out operations for the same name 
operation and report them periodically as a count. It's not urgent, but finding 
a way to report on what happened that is less verbose, but still useful would 
be good. Reporting on how long it took the timeout to fire is useful because it 
can give you an idea of how timely the expiration thread is which can hint at 
external pauses.

ReadCommandVerbHandler constructs an OpState with a string that is the CQL 
command. It would be nice not to have to do any string construction there and 
if we did maybe try and cache it somehow. Maybe don't generate the string 
eagerly, save it for when there is a timeout and you actually have to format 
it? If you can avoid creating an supplier wrapper that is a plus as well. Maybe 
have the thing that generates the string implement an interface for it and 
supply that to the OpState.

In DroppableRunnable you switch from System.currentTimeMillis() to nanoTime(). 
I always looking for opportunities to avoid getting the current time (get it 
once and pass it around). We grab the time once when reading a message so 
theoretically if there has been no blocking you don't need to grab it again if 
we had ponies.

WRT to OpState. Use unsafe instead of AtomicReferenceFieldUpdater. ATRFU is not 
super fast last I heard. If you are in the business of looking to avoid extra 
objects/indirection is it possible to stuff the fields inside an existing 
object that is present everywhere OpState is already used? mixins where art 
thou.

The OpMonitoring tests don't wait long enough. I would write those to sleep, 
and then spin for a looong time checking to see if the cancellation took place. 
The reason I write this way is that I don't want a huge hiccup on the test node 
to randomly throw up a failing test. It will run fast most of the time since it 
is spinning.


was (Author: aweisberg):
Not done reviewing. Just wanted to drop a thought on design/performance. Right 
now it schedules a monitoring task for every operation. This has a cost linear 
to the number of operations and adds a contended queue with a not great queue 
implementation (priority queue). My wild guess is that there may be a workload 
we care about that will highlight this. If we do our jobs it should bottleneck 
eventually as it is shared state accessed for every single operation.

What I would suggest is inverting the process. Have a thread wake up 
periodically and walk the list of in progress operations looking for ones that 
need to be timed out. You can store the in progress operation in a thread 
local. When the thread local is created it inserts into a COW array list that 
the monitoring thread can iterate. That way there is no mutation of shared 
state and it only enters shared state periodically when the the monitoring 
thread walks the list.

This alternative approach has a cost linear to the number of threads 
dispatching tasks and almost no coherence cost.

Just to validate my understanding, when a query is aborted it either sends no 
response or sends an error response? I just want to be sure that it doesn't 
silently claim there is no more data.

The logging in OpMonitor isn't rate limited or aggregating. One super fancy 
thing to do would be to count all the timed out operations for the same name 
operation and report them periodically as a count. It's not urgent, but finding 
a way to report on what happened that is less verbose, but still useful would 
be good. Reporting on how long it took the timeout to fire is useful because it 
can give you an idea of how timely the expiration thread is which can hint at 
external pauses.

ReadCommandVerbHandler constructs an OpState with a string that is the CQL 
command. It would be nice not to have to do any string construction there and 
if we did maybe try and cache it somehow.

In DroppableRunnable you switch from System.currentTimeMillis() to nanoTime(). 
I always looking for opportunities to avoid getting the current time (get it 
once and pass it around). We grab the time once when reading a message so 
theoretically if there has been no blocking you don't need to grab it again if 
we had ponies.

WRT to OpState. Use unsafe instead of AtomicReferenceFieldUpdater. ATRFU is not 
super fast last I heard. If you are in the business of looking to avoid extra 
objects/indirection is it possible to stuff the fields inside an existing 
object that is present everywhere OpState is already used? mixins where art 
thou.

The OpMonitoring tests don't wait long enough. I would write those to sleep, 
and then spin for a looong time checking to see if the cancellation took place. 
The reason I write this way is that I don't want a huge hiccup on the test node 
to randomly throw up a failing test. It will run fast most of the time since it 
is spinning.

> Abort in-progress queries that time out
> ---------------------------------------
>
>                 Key: CASSANDRA-7392
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7392
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Stefania
>             Fix For: 3.x
>
>
> Currently we drop queries that time out before we get to them (because node 
> is overloaded) but not queries that time out while being processed.  
> (Particularly common for index queries on data that shouldn't be indexed.)  
> Adding the latter and logging when we have to interrupt one gets us a poor 
> man's "slow query log" for free.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-7392) Abort in-progress queries that time out

Reply via email to