[ 
https://issues.apache.org/jira/browse/CASSANDRA-604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786662#action_12786662
 ] 

Ramzi Rabah commented on CASSANDRA-604:
---------------------------------------

Thinking about it some more, there is another case when data can be lost. In 
the above case the file containing the tombstone was compacted by itself before 
the data file.
The second case is that the file containing the data is compacted by itself 
before the tombstone is compacted. 

So in both cases, it seems like the only viable solution I can think of, is to 
only remove the tombstones when every single SSTable file for the column family 
is compacted (I.E. major compaction). Otherwise, the tombstone should stick 
around.

Does that make sense?

> Compactions might remove tombstones without removing the actual data
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-604
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-604
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>         Environment: Cent-OS
>            Reporter: Ramzi Rabah
>             Fix For: 0.5
>
>
> I was looking at the code for compaction, and noticed that when we are doing 
> compactions during the normal course of
> Cassandra, we call:
>            for (List<SSTableReader> sstables :
> getCompactionBuckets(ssTables_, 50L * 1024L * 1024L))
>            {
>                if (sstables.size() < minThreshold)
>                {
>                    continue;
>                }
>                other wise docompactions...
> where getCompactionBuckets puts in buckets very small files, or files
> that are 0.5-1.5 of each other's sizes. It will only compact those if
> they are >= minimum threshold which is 4 by default.
> So far so good. Now how about this scenario, I have an old entry that
> I inserted long time ago and that was compacted into a 75MB file.
> There are fewer 75MB files than 4. I do many deletes, and I end with 4
> extra sstable files filled with tombstones, each about 300 MB large.
> These 4 files are compacted together and in the compaction code, if
> the tombstone is there we don't copy it over to the new file. Now
> since we did not compact the 75MB files, but we compacted the
> tombstone files, that leaves us with the tombstone gone, but
> the data still intact in the 75MB file. If we compacted all the
> files together I don't think that would be a problem, but since we
> only compact 4, this potentially leaves data not cleaned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to