[
https://issues.apache.org/jira/browse/CASSANDRA-6446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sylvain Lebresne updated CASSANDRA-6446:
----------------------------------------
Attachment: 6446-write-path-v3.txt
6446-Read-patch-v3.txt
Attaching v3 versions that are basically rebased versions of v2. For the write
path though, v2 was checking the numbers of cells in the updated CF to decide
which code path to use, but with the new BTree implementation, there doesn't
seem to be an easy way to get the number of cells without iterating over all
cells (which would defeat the purpose in that case). It's probably possible to
add such capability but I didn't dig that far. I simplified it by removing the
code path choice and only keeping the path that always iterate on ranges first,
as it's the one that won't crap itself when things grows. It might be slightly
slower when there is a small number of cells, but 1) it's not even all that
clear that it's slower, 2) this is only taken if there is 2ndary indexes to
update and if the update applied has range tombstones and 3) it avoids relying
on some hardcoded and semi-randomly picked constant.
> Faster range tombstones on wide partitions
> ------------------------------------------
>
> Key: CASSANDRA-6446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6446
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Oleg Anastasyev
> Assignee: Oleg Anastasyev
> Fix For: 2.1
>
> Attachments: 0001-6446-write-path-v2.txt,
> 0002-6446-Read-patch-v2.txt, 6446-Read-patch-v3.txt, 6446-write-path-v3.txt,
> RangeTombstonesReadOptimization.diff, RangeTombstonesWriteOptimization.diff
>
>
> Having wide CQL rows (~1M in single partition) and after deleting some of
> them, we found inefficiencies in handling of range tombstones on both write
> and read paths.
> I attached 2 patches here, one for write path
> (RangeTombstonesWriteOptimization.diff) and another on read
> (RangeTombstonesReadOptimization.diff).
> On write path, when you have some CQL rows deletions by primary key, each of
> deletion is represented by range tombstone. On put of this tombstone to
> memtable the original code takes all columns from memtable from partition and
> checks DeletionInfo.isDeleted by brute for loop to decide, should this column
> stay in memtable or it was deleted by new tombstone. Needless to say, more
> columns you have on partition the slower deletions you have heating your CPU
> with brute range tombstones check.
> The RangeTombstonesWriteOptimization.diff patch for partitions with more than
> 10000 columns loops by tombstones instead and checks existance of columns for
> each of them. Also it copies of whole memtable range tombstone list only if
> there are changes to be made there (original code copies range tombstone list
> on every write).
> On read path, original code scans whole range tombstone list of a partition
> to match sstable columns to their range tomstones. The
> RangeTombstonesReadOptimization.diff patch scans only necessary range of
> tombstones, according to filter used for read.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)