Oleg Anastasyev created CASSANDRA-6446:
------------------------------------------

             Summary: Faster range tombstones on wide rows
                 Key: CASSANDRA-6446
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6446
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Oleg Anastasyev


Having wide CQL rows (~1M in single partition) and after deleting some of them, 
we found inefficiencies in handling of range tombstones on both write and read 
paths.

I attached 2 patches here, one for write path 
(RangeTombstonesWriteOptimization.diff) and another on read 
(RangeTombstonesReadOptimization.diff).

On write path, when you have some CQL rows deletions by primary key, each of 
deletion is represented by range tombstone. On put of this tombstone to 
memtable the original code takes all columns from memtable from partition and 
checks DeletionInfo.isDeleted by brute for loop to decide, should this column 
stay in memtable or it was deleted by new tombstone. Needless to say, more 
columns you have on partition the slower deletions you have heating your CPU 
with brute range tombstones check. 
The RangeTombstonesWriteOptimization.diff patch for partitions with more than 
10000 columns loops by tombstones instead and checks existance of columns for 
each of them. Also it copies of whole memtable range tombstone list only if 
there are changes to be made there (original code copies range tombstone list 
on every write).

On read path, original code scans whole range tombstone list of a partition to 
match sstable columns to their range tomstones. The 
RangeTombstonesReadOptimization.diff patch scans only necessary range of 
tombstones, according to filter used for read.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to