[ 
https://issues.apache.org/jira/browse/CASSANDRA-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990658#comment-12990658
 ] 

Chris Goffinet commented on CASSANDRA-2109:
-------------------------------------------

Thoughts about this:

Maybe a histogram? A few scenarios could happen:

1) Bloom Filter Misses
2) Row Caches
3) Data in page cache returning back quickly

We've seen disk failures jump into two scenarios: response timing out because 
the disk just never returned, and fast fail. We account for the first scenario 
but not the fast fail cases. When the fast fail case happens, it throws an 
IOError on the bad node immediately, and the expired map kicks in on the 
coordinator eventually for adjusting scores. If we do nothing on the bad node, 
we make the assumptions people have smart clients (which I hope they do) to 
remove the bad node from the list after enough timeouts. We should most likely 
catch the IOError and throw a special error to client so he knows the node is 
Unavailable so the smart client can make a decision. Else he will just get the 
generic error or timeout.

I am a little inclined to say if a node is seeing a series of IOErrors locally, 
it should put itself into a failed state and stop accepting traffic. That might 
be a little fearful for some though. Thoughts?

> Improve default window size for DES
> -----------------------------------
>
>                 Key: CASSANDRA-2109
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2109
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Stu Hood
>            Priority: Minor
>              Labels: des
>             Fix For: 0.8
>
>
> The window size for DES is currently hardcoded at 100 requests. A larger 
> window means that it takes longer to react to a suddenly slow node, but that 
> you have a smoother transition for scores.
> An example of bad behaviour: with a window of size 100, we saw a case with a 
> failing node where if enough requests could be answered quickly out of cache 
> or bloomfilters, the window might be momentarily filled with 10 ms requests, 
> pushing out requests that had to go disk and took 10 seconds.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to