[
https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593843#comment-13593843
]
Rick Branson commented on CASSANDRA-3929:
-----------------------------------------
[~liqusha]: What I mean is that in order to DELETE only the tail, Cassandra
will have to read the entire row. For instance, your minimum retention
requirement is ~500 columns, in order to find any columns after the 500th, the
following operations must be performed:
* All of the columns are read from the SSTable files that contain columns for
that row
* These row fragments are "merged" (re-sorting by Comparator, tombstone
removal, etc)
* Tombstones must be inserted for each column "after" the 500th.
* As time goes on and tombstones build up (before GC grace), this operation
gets more and more expensive and compaction perf also suffers.
What I mean by "free" is not actually the need to perform the DELETE operation,
but that it doesn't add extra cost burden to support this feature.
As far as use case, it varies quite a bit. There are many use cases I can
imagine for persistent storage with a quota for each user that auto-evicts old
data over time for a low cost. Even for "big data" scenarios, the cost of
computing still goes up as the data size grows. For instance, a database used
to store objects a user interacted with for performing collaborative filtering
only needs a sample. In real world use cases, these types of algorithms really
need a relatively bounded set of data, and user taste might change over time,
so only taking into consideration the most recent 90 objects makes sense.
TTL'ing this data also doesn't make sense, because there are a wide range of
frequencies at which users might generate this data.
[~slebresne]: I spent a few hours digging thru the compaction source and it's
going to be messy to do this, probably involving a lot of copy+paste, so I'm
even more +1 on disaggregating that massive Runnable method in CompactionTask
into something more pluggable / extensible.
> Support row size limits
> -----------------------
>
> Key: CASSANDRA-3929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
> Project: Cassandra
> Issue Type: New Feature
> Components: Core
> Reporter: Jonathan Ellis
> Priority: Minor
> Labels: ponies
> Fix For: 2.0
>
> Attachments: 3929_b.txt, 3929_c.txt, 3929_d.txt, 3929_e.txt,
> 3929_f.txt, 3929_g_tests.txt, 3929_g.txt, 3929.txt
>
>
> We currently support expiring columns by time-to-live; we've also had
> requests for keeping the most recent N columns in a row.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira