[jira] [Comment Edited] (CASSANDRA-6446) Faster range tombstones on wide partitions

Oleg Anastasyev (JIRA) Thu, 16 Jan 2014 09:35:07 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873641#comment-13873641
 ]


Oleg Anastasyev edited comment on CASSANDRA-6446 at 1/16/14 5:32 PM:
---------------------------------------------------------------------

I am not feeling myself comfortable with 2.1 branch, so not much help from me 
here.

Meanwhile we found a bug in v2 version (and v3 I believe as well). 
SliceQueryFilter has wrong behavior when reversed=true. 

The bug can be reproduced in cqlsh :
{code}
cqlsh:test> create table testppd ( a int, b int, c text, d bigint, primary key 
( a,b,c ) ) with clustering order by ( b desc, c asc);
cqlsh:test> INSERT INTO testppd (a, b, c,d) VALUES ( 100,12,'Malvina',111);
cqlsh:test> INSERT INTO testppd (a, b, c,d) VALUES ( 
100,13,'Karabas-Barabas',111);
cqlsh:test> select * from testppd where a=100 and b>11 and b <=13;

 a   | b  | c               | d
-----+----+-----------------+-----
 100 | 13 | Karabas-Barabas | 111
 100 | 12 |         Malvina | 111

(2 rows)

cqlsh:test> select * from testppd where a=100 and b>11 and b <=13 order by b 
desc;

 a   | b  | c               | d
-----+----+-----------------+-----
 100 | 13 | Karabas-Barabas | 111
 100 | 12 |         Malvina | 111

(2 rows)

cqlsh:test> select * from testppd where a=100 and b>11 and b <=13 order by b 
asc;

 a   | b  | c               | d
-----+----+-----------------+-----
 100 | 12 |         Malvina | 111
 100 | 13 | Karabas-Barabas | 111

(2 rows)

cqlsh:test> delete from testppd where a=100 and b = 13 and c='Karabas-Barabas';
cqlsh:test> 
cqlsh:test> select * from testppd where a=100 and b>11 and b <=13 order by b 
desc;

 a   | b  | c       | d
-----+----+---------+-----
 100 | 12 | Malvina | 111

(1 rows)

cqlsh:test> select * from testppd where a=100 and b>11 and b <=13 order by b 
asc;

 a   | b  | c               | d
-----+----+-----------------+-----
 100 | 12 |         Malvina | 111
 100 | 13 | Karabas-Barabas | 111

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
just removed record is resurrected
{code}

fixed it by patching your v2 patch (god plz forgive me) with :
{code}
diff --git a/src/java/org/apache/cassandra/db/filter/SliceQueryFilter.java 
b/src/java/org/apache/cassandra/db/filter/SliceQueryFilter.java
index f6d2b17..d7fe875 100644
--- a/src/java/org/apache/cassandra/db/filter/SliceQueryFilter.java
+++ b/src/java/org/apache/cassandra/db/filter/SliceQueryFilter.java
@@ -379,7 +379,8 @@ public class SliceQueryFilter implements IDiskAtomFilter
         return new AbstractIterator<RangeTombstone>()
         {
             private int sliceIdx = 0;
-            private Iterator<RangeTombstone> sliceIter = 
delInfo.rangeIterator(slices[0].start, slices[0].finish);
+
+            private Iterator<RangeTombstone> sliceIter = reversed ? 
delInfo.rangeIterator(slices[0].finish, slices[0].start) : 
delInfo.rangeIterator(slices[0].start, slices[0].finish);

             protected RangeTombstone computeNext()
             {

{code}
i.e by reversing start and finish of the slice when slie filter is reversed


was (Author: m0nstermind):
I am not feeling myself comfortable with 2.1 branch, so not much help from me 
here.

Meanwhile we found a bug in v2 version (and v3 I believe as well). 
SliceQueryFilter has wrong behavior when reversed=true. 

The bug can be reproduced in cqlsh :
{code}
cqlsh:test> create table testppd ( a int, b int, c text, d bigint, primary key 
( a,b,c ) ) with clustering order by ( b desc, c asc);
cqlsh:test> INSERT INTO testppd (a, b, c,d) VALUES ( 100,12,'Malvina',111);
cqlsh:test> INSERT INTO testppd (a, b, c,d) VALUES ( 
100,13,'Karabas-Barabas',111);
cqlsh:test> select * from testppd where a=100 and b>11 and b <=13;

 a   | b  | c               | d
-----+----+-----------------+-----
 100 | 13 | Karabas-Barabas | 111
 100 | 12 |         Malvina | 111

(2 rows)

cqlsh:test> select * from testppd where a=100 and b>11 and b <=13 order by b 
desc;

 a   | b  | c               | d
-----+----+-----------------+-----
 100 | 13 | Karabas-Barabas | 111
 100 | 12 |         Malvina | 111

(2 rows)

cqlsh:test> select * from testppd where a=100 and b>11 and b <=13 order by b 
asc;

 a   | b  | c               | d
-----+----+-----------------+-----
 100 | 12 |         Malvina | 111
 100 | 13 | Karabas-Barabas | 111

(2 rows)

cqlsh:test> delete from testppd where a=100 and b = 13 and c='Karabas-Barabas';
cqlsh:test> 
cqlsh:test> select * from testppd where a=100 and b>11 and b <=13 order by b 
desc;

 a   | b  | c       | d
-----+----+---------+-----
 100 | 12 | Malvina | 111

(1 rows)

cqlsh:test> select * from testppd where a=100 and b>11 and b <=13 order by b 
asc;

 a   | b  | c               | d
-----+----+-----------------+-----
 100 | 12 |         Malvina | 111
 100 | 13 | Karabas-Barabas | 111

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
just removed record is resurrected
{code}

fixed it by patching your v2 patch (god plz forgive me) with :
{code}
diff --git a/src/java/org/apache/cassandra/db/filter/SliceQueryFilter.java 
b/src/java/org/apache/cassandra/db/filter/SliceQueryFilter.java
index f6d2b17..d7fe875 100644
--- a/src/java/org/apache/cassandra/db/filter/SliceQueryFilter.java
+++ b/src/java/org/apache/cassandra/db/filter/SliceQueryFilter.java
@@ -379,7 +379,8 @@ public class SliceQueryFilter implements IDiskAtomFilter
         return new AbstractIterator<RangeTombstone>()
         {
             private int sliceIdx = 0;
-            private Iterator<RangeTombstone> sliceIter = 
delInfo.rangeIterator(slices[0].start, slices[0].finish);
+
+            private Iterator<RangeTombstone> sliceIter = reversed ? 
delInfo.rangeIterator(slices[0].finish, slices[0].start) : delInfo.rang
eIterator(slices[0].start, slices[0].finish);

             protected RangeTombstone computeNext()
             {

{code}
i.e by reversing start and finish of the slice when slie filter is reversed

> Faster range tombstones on wide partitions
> ------------------------------------------
>
>                 Key: CASSANDRA-6446
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6446
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Oleg Anastasyev
>            Assignee: Oleg Anastasyev
>             Fix For: 2.1
>
>         Attachments: 0001-6446-write-path-v2.txt, 
> 0002-6446-Read-patch-v2.txt, 6446-Read-patch-v3.txt, 6446-write-path-v3.txt, 
> RangeTombstonesReadOptimization.diff, RangeTombstonesWriteOptimization.diff
>
>
> Having wide CQL rows (~1M in single partition) and after deleting some of 
> them, we found inefficiencies in handling of range tombstones on both write 
> and read paths.
> I attached 2 patches here, one for write path 
> (RangeTombstonesWriteOptimization.diff) and another on read 
> (RangeTombstonesReadOptimization.diff).
> On write path, when you have some CQL rows deletions by primary key, each of 
> deletion is represented by range tombstone. On put of this tombstone to 
> memtable the original code takes all columns from memtable from partition and 
> checks DeletionInfo.isDeleted by brute for loop to decide, should this column 
> stay in memtable or it was deleted by new tombstone. Needless to say, more 
> columns you have on partition the slower deletions you have heating your CPU 
> with brute range tombstones check. 
> The RangeTombstonesWriteOptimization.diff patch for partitions with more than 
> 10000 columns loops by tombstones instead and checks existance of columns for 
> each of them. Also it copies of whole memtable range tombstone list only if 
> there are changes to be made there (original code copies range tombstone list 
> on every write).
> On read path, original code scans whole range tombstone list of a partition 
> to match sstable columns to their range tomstones. The 
> RangeTombstonesReadOptimization.diff patch scans only necessary range of 
> tombstones, according to filter used for read.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Comment Edited] (CASSANDRA-6446) Faster range tombstones on wide partitions

Reply via email to