[
https://issues.apache.org/jira/browse/CASSANDRA-6446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873641#comment-13873641
]
Oleg Anastasyev edited comment on CASSANDRA-6446 at 1/16/14 5:32 PM:
---------------------------------------------------------------------
I am not feeling myself comfortable with 2.1 branch, so not much help from me
here.
Meanwhile we found a bug in v2 version (and v3 I believe as well).
SliceQueryFilter has wrong behavior when reversed=true.
The bug can be reproduced in cqlsh :
{code}
cqlsh:test> create table testppd ( a int, b int, c text, d bigint, primary key
( a,b,c ) ) with clustering order by ( b desc, c asc);
cqlsh:test> INSERT INTO testppd (a, b, c,d) VALUES ( 100,12,'Malvina',111);
cqlsh:test> INSERT INTO testppd (a, b, c,d) VALUES (
100,13,'Karabas-Barabas',111);
cqlsh:test> select * from testppd where a=100 and b>11 and b <=13;
a | b | c | d
-----+----+-----------------+-----
100 | 13 | Karabas-Barabas | 111
100 | 12 | Malvina | 111
(2 rows)
cqlsh:test> select * from testppd where a=100 and b>11 and b <=13 order by b
desc;
a | b | c | d
-----+----+-----------------+-----
100 | 13 | Karabas-Barabas | 111
100 | 12 | Malvina | 111
(2 rows)
cqlsh:test> select * from testppd where a=100 and b>11 and b <=13 order by b
asc;
a | b | c | d
-----+----+-----------------+-----
100 | 12 | Malvina | 111
100 | 13 | Karabas-Barabas | 111
(2 rows)
cqlsh:test> delete from testppd where a=100 and b = 13 and c='Karabas-Barabas';
cqlsh:test>
cqlsh:test> select * from testppd where a=100 and b>11 and b <=13 order by b
desc;
a | b | c | d
-----+----+---------+-----
100 | 12 | Malvina | 111
(1 rows)
cqlsh:test> select * from testppd where a=100 and b>11 and b <=13 order by b
asc;
a | b | c | d
-----+----+-----------------+-----
100 | 12 | Malvina | 111
100 | 13 | Karabas-Barabas | 111
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
just removed record is resurrected
{code}
fixed it by patching your v2 patch (god plz forgive me) with :
{code}
diff --git a/src/java/org/apache/cassandra/db/filter/SliceQueryFilter.java
b/src/java/org/apache/cassandra/db/filter/SliceQueryFilter.java
index f6d2b17..d7fe875 100644
--- a/src/java/org/apache/cassandra/db/filter/SliceQueryFilter.java
+++ b/src/java/org/apache/cassandra/db/filter/SliceQueryFilter.java
@@ -379,7 +379,8 @@ public class SliceQueryFilter implements IDiskAtomFilter
return new AbstractIterator<RangeTombstone>()
{
private int sliceIdx = 0;
- private Iterator<RangeTombstone> sliceIter =
delInfo.rangeIterator(slices[0].start, slices[0].finish);
+
+ private Iterator<RangeTombstone> sliceIter = reversed ?
delInfo.rangeIterator(slices[0].finish, slices[0].start) :
delInfo.rangeIterator(slices[0].start, slices[0].finish);
protected RangeTombstone computeNext()
{
{code}
i.e by reversing start and finish of the slice when slie filter is reversed
was (Author: m0nstermind):
I am not feeling myself comfortable with 2.1 branch, so not much help from me
here.
Meanwhile we found a bug in v2 version (and v3 I believe as well).
SliceQueryFilter has wrong behavior when reversed=true.
The bug can be reproduced in cqlsh :
{code}
cqlsh:test> create table testppd ( a int, b int, c text, d bigint, primary key
( a,b,c ) ) with clustering order by ( b desc, c asc);
cqlsh:test> INSERT INTO testppd (a, b, c,d) VALUES ( 100,12,'Malvina',111);
cqlsh:test> INSERT INTO testppd (a, b, c,d) VALUES (
100,13,'Karabas-Barabas',111);
cqlsh:test> select * from testppd where a=100 and b>11 and b <=13;
a | b | c | d
-----+----+-----------------+-----
100 | 13 | Karabas-Barabas | 111
100 | 12 | Malvina | 111
(2 rows)
cqlsh:test> select * from testppd where a=100 and b>11 and b <=13 order by b
desc;
a | b | c | d
-----+----+-----------------+-----
100 | 13 | Karabas-Barabas | 111
100 | 12 | Malvina | 111
(2 rows)
cqlsh:test> select * from testppd where a=100 and b>11 and b <=13 order by b
asc;
a | b | c | d
-----+----+-----------------+-----
100 | 12 | Malvina | 111
100 | 13 | Karabas-Barabas | 111
(2 rows)
cqlsh:test> delete from testppd where a=100 and b = 13 and c='Karabas-Barabas';
cqlsh:test>
cqlsh:test> select * from testppd where a=100 and b>11 and b <=13 order by b
desc;
a | b | c | d
-----+----+---------+-----
100 | 12 | Malvina | 111
(1 rows)
cqlsh:test> select * from testppd where a=100 and b>11 and b <=13 order by b
asc;
a | b | c | d
-----+----+-----------------+-----
100 | 12 | Malvina | 111
100 | 13 | Karabas-Barabas | 111
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
just removed record is resurrected
{code}
fixed it by patching your v2 patch (god plz forgive me) with :
{code}
diff --git a/src/java/org/apache/cassandra/db/filter/SliceQueryFilter.java
b/src/java/org/apache/cassandra/db/filter/SliceQueryFilter.java
index f6d2b17..d7fe875 100644
--- a/src/java/org/apache/cassandra/db/filter/SliceQueryFilter.java
+++ b/src/java/org/apache/cassandra/db/filter/SliceQueryFilter.java
@@ -379,7 +379,8 @@ public class SliceQueryFilter implements IDiskAtomFilter
return new AbstractIterator<RangeTombstone>()
{
private int sliceIdx = 0;
- private Iterator<RangeTombstone> sliceIter =
delInfo.rangeIterator(slices[0].start, slices[0].finish);
+
+ private Iterator<RangeTombstone> sliceIter = reversed ?
delInfo.rangeIterator(slices[0].finish, slices[0].start) : delInfo.rang
eIterator(slices[0].start, slices[0].finish);
protected RangeTombstone computeNext()
{
{code}
i.e by reversing start and finish of the slice when slie filter is reversed
> Faster range tombstones on wide partitions
> ------------------------------------------
>
> Key: CASSANDRA-6446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6446
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Oleg Anastasyev
> Assignee: Oleg Anastasyev
> Fix For: 2.1
>
> Attachments: 0001-6446-write-path-v2.txt,
> 0002-6446-Read-patch-v2.txt, 6446-Read-patch-v3.txt, 6446-write-path-v3.txt,
> RangeTombstonesReadOptimization.diff, RangeTombstonesWriteOptimization.diff
>
>
> Having wide CQL rows (~1M in single partition) and after deleting some of
> them, we found inefficiencies in handling of range tombstones on both write
> and read paths.
> I attached 2 patches here, one for write path
> (RangeTombstonesWriteOptimization.diff) and another on read
> (RangeTombstonesReadOptimization.diff).
> On write path, when you have some CQL rows deletions by primary key, each of
> deletion is represented by range tombstone. On put of this tombstone to
> memtable the original code takes all columns from memtable from partition and
> checks DeletionInfo.isDeleted by brute for loop to decide, should this column
> stay in memtable or it was deleted by new tombstone. Needless to say, more
> columns you have on partition the slower deletions you have heating your CPU
> with brute range tombstones check.
> The RangeTombstonesWriteOptimization.diff patch for partitions with more than
> 10000 columns loops by tombstones instead and checks existance of columns for
> each of them. Also it copies of whole memtable range tombstone list only if
> there are changes to be made there (original code copies range tombstone list
> on every write).
> On read path, original code scans whole range tombstone list of a partition
> to match sstable columns to their range tomstones. The
> RangeTombstonesReadOptimization.diff patch scans only necessary range of
> tombstones, according to filter used for read.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)