[
https://issues.apache.org/jira/browse/CASSANDRA-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027543#comment-13027543
]
Aaron Morton edited comment on CASSANDRA-2589 at 5/2/11 4:51 AM:
-----------------------------------------------------------------
bq. What's supposed to happen is, isRelevant will supress those columns (which
may be in an older sstable). We should never require a read (e.g. to load a
list of all-columns-deleted), when doing a write.
Was only thinking about the columns in the memtable. edit: I think the best
idea is to ignore the columns when serialising the CF to SSTable.
bq. If you have columns in memory when you do a row deletion, it shouldn't
matter whether we write those out or not, as far as correctness is concerned.
agree this was more of a performance issue, e.g. write a lot of data and delete
it quickly (before memtable flush) using a row delete takes more disk space
than deleting by column path.
CASSANDRA-2590 is where I noticed it breaking correctness.
was (Author: amorton):
bq. What's supposed to happen is, isRelevant will supress those columns
(which may be in an older sstable). We should never require a read (e.g. to
load a list of all-columns-deleted), when doing a write.
Was only thinking about the columns in the memtable.
bq. If you have columns in memory when you do a row deletion, it shouldn't
matter whether we write those out or not, as far as correctness is concerned.
agree this was more of a performance issue, e.g. write a lot of data and delete
it quickly (before memtable flush) using a row delete takes more disk space
than deleting by column path.
CASSANDRA-2590 is where I noticed it breaking correctness.
> row deletes do not remove columns
> ---------------------------------
>
> Key: CASSANDRA-2589
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2589
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.7.5, 0.8 beta 1
> Reporter: Aaron Morton
> Assignee: Aaron Morton
> Priority: Minor
>
> When a row delete is issued CF.delete() sets the localDeletetionTime and
> markedForDeleteAt values but does not remove columns which have a lower time
> stamp. As a result:
> # Memory which could be freed is held on to (prob not too bad as it's already
> counted)
> # The deleted columns are serialised to disk, along with the CF info to say
> they are no longer valid.
> # NamesQueryFilter and SliceQueryFilter have to do more work as they filter
> out the irrelevant columns using QueryFilter.isRelevant()
> # Also columns written with a lower time stamp after the deletion are added
> to the CF without checking markedForDeletionAt.
> This can cause RR to fail, will create another ticket for that and link. This
> ticket is for a fix to removing the columns.
> Two options I could think of:
> # Check for deletion when serialising to SSTable and ignore columns if the
> have a lower timestamp. Otherwise leave as is so dead columns stay in memory.
> # Ensure at all times if the CF is deleted all columns it contains have a
> higher timestamp.
> ## I *think* this would include all column types (DeletedColumn as well) as
> the CF deletion has the same effect. But not sure.
> ## Deleting (potentially) all columns in delete() will take time. Could track
> the highest timestamp in the CF so the normal case of deleting all cols does
> not need to iterate.
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira