[ 
https://issues.apache.org/jira/browse/CASSANDRA-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

André Cruz updated CASSANDRA-5143:
----------------------------------

    Description: 
When doing a range query on a row with a lot of tombstones, these can quickly 
add up and use too much heap, even if we specify a column count of 2 as the 
tombstones can be between those two live columns. From the client API side it 
can do nothing to prevent this from happening since there is no limit that can 
be specified for the number of tombstones being collected.

I know that this looks like the "I'm using a row as a queue and building up a 
ton of tombstones" anti-pattern, but still Cassandra should be able to take 
better care of himself so as to prevent a DoS. I can imagine a lot of use cases 
that let users create and delete columns on a row.

I propose a simple safety valve that can act like this: "The client has asked 
me for X nodes, I've already collected X^Y nodes and still have not found X 
live nodes, I should just give up and return an exception". The Y would be the 
configurable parameter. Time taken per query or memory used could also be 
factors to take into consideration.

  was:
When doing a range query on a row with a lot of tombstones, these can quickly 
add up and use too much heap, even if we specify a column count of 2 as the 
tombstones can be between those two live columns. From the client API side it 
can do nothing to prevent this from happening since there is no limit that can 
be specified for the number of tombstones being collected.

I know that this looks like the "I'm using a row as a queue and building up a 
ton of tombstones" anti-pattern, but still Cassandra should be able to take 
better care of himself so as to prevent a DoS. I can imagine a lot of use cases 
that let users create and delete columns on a row.

I propose a simple safety valve that can act like this: "The client has asked 
me for X nodes, I've already collected X^Y nodes and still have not found X 
live nodes, I should just give up". The Y would be the configurable parameter. 
Time taken per query or memory used could also be factors to take into 
consideration.

    
> Safety valve on number of tombstones skipped on read path too prevent a full 
> heap
> ---------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-5143
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5143
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.1.5
>         Environment: Debian Linux, 3 node cluster with RF 3, 8GB heap on 32GB 
> machines
>            Reporter: André Cruz
>
> When doing a range query on a row with a lot of tombstones, these can quickly 
> add up and use too much heap, even if we specify a column count of 2 as the 
> tombstones can be between those two live columns. From the client API side it 
> can do nothing to prevent this from happening since there is no limit that 
> can be specified for the number of tombstones being collected.
> I know that this looks like the "I'm using a row as a queue and building up a 
> ton of tombstones" anti-pattern, but still Cassandra should be able to take 
> better care of himself so as to prevent a DoS. I can imagine a lot of use 
> cases that let users create and delete columns on a row.
> I propose a simple safety valve that can act like this: "The client has asked 
> me for X nodes, I've already collected X^Y nodes and still have not found X 
> live nodes, I should just give up and return an exception". The Y would be 
> the configurable parameter. Time taken per query or memory used could also be 
> factors to take into consideration.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to