[ 
https://issues.apache.org/jira/browse/CASSANDRA-5183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne resolved CASSANDRA-5183.
-----------------------------------------

       Resolution: Duplicate
    Fix Version/s:     (was: 1.2.2)

Seems like 4 months is the limit of my memory, this is the same as 
CASSANDRA-4671.
                
> Improve cases where we purge tombstone on (minor) compaction
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-5183
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5183
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>
> Currently, to be able to purge a tombstone, we check that the row it is part 
> of is not present in a non-compacted sstable, as we should not remove a 
> tombstone that may delete other columns in the non-compacted sstables.
> The (known) problem is, if you regularly update a row on which you've made 
> deletes, tombstone may theoretically be kept forever unless you run a major 
> compaction (which is bad and not even a possibility with leveled compaction).
> In practice, with wide rows and more precisely time-series type of load, it 
> is not unlikely that tombstones might be kept, if not forever, at least much 
> longer than gcgrace.
> One avoid to improve on that would be to start storing the minTimestamp of 
> sstables (like we keep the maxTimestamp). During compaction, on top checking 
> bloom filters, we would also check if the max timestamp of what we're about 
> to purge is smaller than the min timestamp of the non compact sstable. If it 
> is, then whatever tombstone we are looking at cannot shadow something in the 
> non-compacted sstable and we can purge it (that is, even if the row in 
> question may have columns in those non-compacted sstables).
> Note that while this isn't perfect in theory:
> # this is cheap to check. We may even compute the min timestamp of the non 
> compacted sstable once at the beginning of the compaction and check that 
> before looking at the BF, which may save a few intervalTree search (if we do 
> end up doing the intervalTree search however, we might still want recomputing 
> the min timestamp of the returned sstable as this may be bigger that the min 
> timestamp of all the non compacted sstables).
> # both size tiered and leveled natural tend to compact sstable having data of 
> rougthly the same age, so this should work reasonably well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to