[jira] [Updated] (CASSANDRA-4579) CQL queries using LIMIT sometimes missing results

Sylvain Lebresne (JIRA) Tue, 04 Sep 2012 06:41:15 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sylvain Lebresne updated CASSANDRA-4579:
----------------------------------------

    Attachment: 0002-Fix-LIMIT-for-NamesQueryFilter.txt
                0001-Add-all-columns-from-a-prefix-group-before-stopping.txt

There is indeed 2 bugs when counting columns with composites (introduced by the 
change made for collections, so 1.1 is not affected in particular).

The first one is that to count the number of CQL row to return, 
SliceQueryFilter groups columns having the same composite prefix (i.e. all the 
columns belonging to the same CQL row) and count that as 1. However the code 
was stopping collecting columns as sound as the requested count was reached, 
without waiting having seen all the columns of the last "group".

The second one is that for NamesQueryFilter, each internal Cassandra row will 
yield exactly one CQL row, so we must use the "count keys" rather than "count 
columns" argument for getRangeSlice in that case.

Attached fix for both (I've pushed a dtest with the two examples from that 
ticket).

                
> CQL queries using LIMIT sometimes missing results
> -------------------------------------------------
>
>                 Key: CASSANDRA-4579
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4579
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.2.0 beta 1
>            Reporter: paul cannon
>            Assignee: Sylvain Lebresne
>              Labels: cql, cql3
>             Fix For: 1.2.0
>
>         Attachments: 
> 0001-Add-all-columns-from-a-prefix-group-before-stopping.txt, 
> 0002-Fix-LIMIT-for-NamesQueryFilter.txt
>
>
> In certain conditions, CQL queries using LIMIT clauses are not being given 
> all of the expected results (whether unset column values or missing rows).
> Here are the condition sets I've been able to identify:
> First mode: all rows are returned, but in the last row of results, all 
> columns which are not part of the primary key receive no values, except for 
> the first non-primary-key column. Conditions:
>  * Table has a multi-component primary key
>  * Table has more than one column which is not a component of the primary key
>  * The number of results which would be returned by a query is equal to or 
> more than the specified LIMIT
> Second mode: result has fewer rows than it should, lower than both the LIMIT 
> and the actual number of matching rows. Conditions:
>  * Table has a single-column primary key
>  * Table has more than one column which is not a component of the primary key
>  * The number of results which would be returned by a query is equal to or 
> more than the specified LIMIT
> It would make sense to me that this would have started with CASSANDRA-4329, 
> but bisecting indicates that this behavior started with commit 
> 91bdf7fb4220b27e9566c6673bf5dbd14153017c, implementing CASSANDRA-3647.
> Test case for the first failure mode:
> {noformat}
> DROP KEYSPACE test;
> CREATE KEYSPACE test
>     WITH strategy_class = 'SimpleStrategy'
>     AND strategy_options:replication_factor = 1;
> USE test;
> CREATE TABLE testcf (
>     a int,
>     b int,
>     c int,
>     d int,
>     e int,
>     PRIMARY KEY (a, b)
> );
> INSERT INTO testcf (a, b, c, d, e) VALUES (1, 11, 111, 1111, 11111);
> INSERT INTO testcf (a, b, c, d, e) VALUES (2, 22, 222, 2222, 22222);
> INSERT INTO testcf (a, b, c, d, e) VALUES (3, 33, 333, 3333, 33333);
> INSERT INTO testcf (a, b, c, d, e) VALUES (4, 44, 444, 4444, 44444);
> SELECT * FROM testcf;
> SELECT * FROM testcf LIMIT 1; -- columns d and e in result row are null
> SELECT * FROM testcf LIMIT 2; -- columns d and e in last result row are null
> SELECT * FROM testcf LIMIT 3; -- columns d and e in last result row are null
> SELECT * FROM testcf LIMIT 4; -- columns d and e in last result row are null
> SELECT * FROM testcf LIMIT 5; -- results are correct (4 rows returned)
> {noformat}
> Test case for the second failure mode:
> {noformat}
> CREATE KEYSPACE test
>     WITH strategy_class = 'SimpleStrategy'
>     AND strategy_options:replication_factor = 1;
> USE test;
> CREATE TABLE testcf (
>     a int primary key,
>     b int,
>     c int,
> );
> INSERT INTO testcf (a, b, c) VALUES (1, 11, 111);
> INSERT INTO testcf (a, b, c) VALUES (2, 22, 222);
> INSERT INTO testcf (a, b, c) VALUES (3, 33, 333);
> INSERT INTO testcf (a, b, c) VALUES (4, 44, 444);
> SELECT * FROM testcf;
> SELECT * FROM testcf LIMIT 1; -- gives 1 row
> SELECT * FROM testcf LIMIT 2; -- gives 1 row
> SELECT * FROM testcf LIMIT 3; -- gives 2 rows
> SELECT * FROM testcf LIMIT 4; -- gives 2 rows
> SELECT * FROM testcf LIMIT 5; -- gives 3 rows
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4579) CQL queries using LIMIT sometimes missing results

Reply via email to