[ 
https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942757#comment-13942757
 ] 

Bill Mitchell commented on CASSANDRA-6825:
------------------------------------------

I've attached a testdb_1395372407904.zip of the data/testdb_1395372407904 
directory after the test ran.  After the test completed, I did select * from sr 
and it returned 100000 rows:

cqlsh:testdb_1395372407904> select count(*) from sr limit 100000;

 count
--------
 100000

(1 rows)

When I did a select count(*) for each of the six partitions, they total only 
90000:
cqlsh:testdb_1395372407904> select count(*) from sr where siteID = '4CA4F79E-3AB
2-41C5-AE42-C7009736F1D5' and listID = 24 and partition = 0 LIMIT 100000;

 count
-------
 20000

(1 rows)

cqlsh:testdb_1395372407904> select count(*) from sr where siteID = '4CA4F79E-3AB
2-41C5-AE42-C7009736F1D5' and listID = 24 and partition = 1 LIMIT 100000;

 count
-------
 20000

(1 rows)

cqlsh:testdb_1395372407904> select count(*) from sr where siteID = '4CA4F79E-3AB
2-41C5-AE42-C7009736F1D5' and listID = 24 and partition = 2 LIMIT 100000;

 count
-------
 10000

(1 rows)

cqlsh:testdb_1395372407904> select count(*) from sr where siteID = '4CA4F79E-3AB
2-41C5-AE42-C7009736F1D5' and listID = 24 and partition = 3 LIMIT 100000;

 count
-------
 10000

(1 rows)

cqlsh:testdb_1395372407904> select count(*) from sr where siteID = '4CA4F79E-3AB
2-41C5-AE42-C7009736F1D5' and listID = 24 and partition = 4 LIMIT 100000;

 count
-------
 10000

(1 rows)

cqlsh:testdb_1395372407904> select count(*) from sr where siteID = '4CA4F79E-3AB
2-41C5-AE42-C7009736F1D5' and listID = 24 and partition = 5 LIMIT 100000;

 count
-------
 20000

(1 rows)

As it turns out, the 10000 rows not counted were all from partition=2, and have 
a createDate identical except in the milliseconds to 10000 rows that do appear. 
 The common key values of the presumably uncounted rows (as they are the rows 
that did not return on the SELECT query, CASSANDRA-6826) are 
siteID=4CA4F79E-3AB2-41C5-AE42-C7009736F1D5,listID=24,partition=2,createDate=2014-03-20T22:27:26.457-0500.
 


> COUNT(*) with WHERE not finding all the matching rows
> -----------------------------------------------------
>
>                 Key: CASSANDRA-6825
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6825
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: quad core Windows7 x64, single node cluster
> Cassandra 2.0.5
>            Reporter: Bill Mitchell
>            Assignee: Tyler Hobbs
>         Attachments: cassandra.log, selectpartitions.zip, 
> selectrowcounts.txt, testdb_1395372407904.zip
>
>
> Investigating another problem, I needed to do COUNT(*) on the several 
> partitions of a table immediately after a test case ran, and I discovered 
> that count(*) on the full table and on each of the partitions returned 
> different counts.  
> In particular case, SELECT COUNT(*) FROM sr LIMIT 1000000; returned the 
> expected count from the test 99999 rows.  The composite primary key splits 
> the logical row into six distinct partitions, and when I issue a query asking 
> for the total across all six partitions, the returned result is only 83999.  
> Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11 AND 
> partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical 
> WHERE predicate reports only 14,000. 
> This is failing immediately after running a single small test, such that 
> there are only two SSTables, sr-jb-1 and sr-jb-2.  Compaction never needed to 
> run.  
> In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect 
> count(*) results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to