[jira] [Commented] (CASSANDRA-8940) Inconsistent select count and select distinct

Benjamin Lerer (JIRA) Tue, 28 Apr 2015 06:00:11 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14516985#comment-14516985
 ]


Benjamin Lerer commented on CASSANDRA-8940:
-------------------------------------------

{quote}Thanks for the update. I guess you are on to something. Again, if 
there's anything I can help with. I'm happy to pitch in.{quote}

Thanks for the offer :-). For the moment, I am just digging.

{quote}
(a bit of topic): I wasn't aware that Cassandra performs the count on the 
coordinator. I wonder why one couldn't push the count operator to the replicas 
involved. I see that aggregate functions in Cassandra trunk are implemented in 
a similar fashion. A pity if you ask me.{quote}

The advantage of this approach was that the consistency problem was already 
solve. The coordinator was guaranty to have the latest data. 
The plan was to deliver that initial version first and to make it better in the 
future. If you are interested in it, you can follow CASSANDRA-8826. 

{quote}
As I understand it, select count queries operate on top of normal select all 
queries. Does this mean that this 'skipping' of rows might also be a problem in 
other cases? Or is it only a problem because the result set is processed/paged 
on a Cassandra node and not in a driver?
{quote}

The 'skipping' of row might apparently be a problem for queries requesting data 
from more that one partition. I do not know yet the extends of the problem.

> Inconsistent select count and select distinct
> ---------------------------------------------
>
>                 Key: CASSANDRA-8940
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8940
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: 2.1.2
>            Reporter: Frens Jan Rumph
>            Assignee: Benjamin Lerer
>         Attachments: 7b74fb00-e935-11e4-b10c-317579db7eb4.csv, 
> 8d5899d0-e935-11e4-847b-2d06da75a6cd.csv, Vagrantfile, install_cassandra.sh, 
> setup_hosts.sh
>
>
> When performing {{select count( * ) from ...}} I expect the results to be 
> consistent over multiple query executions if the table at hand is not written 
> to / deleted from in the mean time. However, in my set-up it is not. The 
> counts returned vary considerable (several percent). The same holds for 
> {{select distinct partition-key-columns from ...}}.
> I have a table in a keyspace with replication_factor = 1 which is something 
> like:
> {code}
> CREATE TABLE tbl (
>     id frozen<id_type>,
>     bucket bigint,
>     offset int,
>     value double,
>     PRIMARY KEY ((id, bucket), offset)
> )
> {code}
> The frozen udt is:
> {code}
> CREATE TYPE id_type (
>     tags map<text, text>
> );
> {code}
> The table contains around 35k rows (I'm not trying to be funny here ...). The 
> consistency level for the queries was ONE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8940) Inconsistent select count and select distinct

Reply via email to