[
https://issues.apache.org/jira/browse/CASSANDRA-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352958#comment-14352958
]
mck commented on CASSANDRA-8574:
--------------------------------
{quote}…because it can be deleted but we need to wait on the non-short retry to
get the tombstone{quote}
If StorageProxy were to only return a page of results which are known to have a
quorum reply over (ie the smallest returned liveCount), then therefore
shouldn't corresponding tombstones be always /there/ (known to be applied)
within that current page?
Aside from the poor protection which i presume is not knowing exactly which
cells we have a quorum reply over and/or not knowing if every reply just had a
short read from different tombstones (this is actually quite serious as it's a
way to break consistency/reads). Otherwise i'm hoping you already have the
right fix in mind and will describe it in the new issue…
> Gracefully degrade SELECT when there are lots of tombstones
> -----------------------------------------------------------
>
> Key: CASSANDRA-8574
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8574
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jens Rantil
> Fix For: 3.0
>
>
> *Background:* There's lots of tooling out there to do BigData analysis on
> Cassandra clusters. Examples are Spark and Hadoop, which is offered by DSE.
> The problem with both of these so far, is that a single partition key with
> too many tombstones can make the query job fail hard.
> The described scenario happens despite the user setting a rather small
> FetchSize. I assume this is a common scenario if you have larger rows.
> *Proposal:* To allow a CQL SELECT to gracefully degrade to only return a
> smaller batch of results if there are too many tombstones. The tombstones are
> ordered according to clustering key and one should be able to page through
> them. Potentially:
> SELECT * FROM mytable LIMIT 1000 TOMBSTONES;
> would page through maximum 1000 tombstones, _or_ 1000 (CQL) rows.
> I understand that this obviously would degrade performance, but it would at
> least yield a result.
> *Additional comment:* I haven't dug into Cassandra code, but conceptually I
> guess this would be doable. Let me know what you think.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)