[jira] [Commented] (CASSANDRA-2394) Faulty hd kills cluster performance

Sylvain Lebresne (JIRA) Thu, 05 May 2011 11:14:43 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029474#comment-13029474
 ]


Sylvain Lebresne commented on CASSANDRA-2394:
---------------------------------------------

If there is no exceptions whatsoever in the log, I'm not really sure 
CASSANDRA-2118 would help.

What you're saying is that there is lots of error in Kern.log, but none in the 
Cassandra log, right ?
And when you say "the cluster won't respond to any queries anymore", do you 
mean from any node ? And
which consistency level are we talking ?

> Faulty hd kills cluster performance
> -----------------------------------
>
>                 Key: CASSANDRA-2394
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2394
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>            Priority: Minor
>             Fix For: 0.7.6
>
>
> Hi,
> About every week, a node from our main cluster (>100 nodes) has a faulty hd  
> (Listing the cassandra data storage directoy triggers an input/output error).
> Whenever this occurs, I see many timeoutexceptions in our application on 
> various nodes which cause everything to run very very slowly. Keyrange scans 
> just timeout and will sometimes never succeed. If I stop cassandra on the 
> faulty node, everything runs normal again.
> It would be great to have some kind of monitoring thread in cassandra which 
> marks a node as "down" if there are multiple read/write errors to the data 
> directories. A single faulty hd on 1 node shouldn't affect global cluster 
> performance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2394) Faulty hd kills cluster performance

Reply via email to