Mircea Lemnaru created CASSANDRA-11314:
------------------------------------------
Summary: Inconsistent select count(*)
Key: CASSANDRA-11314
URL: https://issues.apache.org/jira/browse/CASSANDRA-11314
Project: Cassandra
Issue Type: Bug
Components: Local Write-Read Paths
Environment: Ununtu 14.04 LTS
Reporter: Mircea Lemnaru
Hello,
I currently have this setup:
Cassandra 3.3 (Community edition downloaded from Datastax) installed on 3 nodes
and I have created this table:
CREATE TABLE billing.collected_data_day (
collection_day int,
timestamp timestamp,
record_id uuid,
dimensions map<text, text>,
entity_id text,
measurements map<text, text>,
source_id text,
PRIMARY KEY (collection_day, timestamp, record_id)
) WITH CLUSTERING ORDER BY (timestamp ASC, record_id ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
This table as you notice is partitioned by collection_day. This is because at
the end of the day we need to have fast access to all the data generated in a
day. collection day will be the x day from 1970
In this table we have inserted roughly 12milion rows for testing purposes and
we did a simple count. As you can see the results vary ...
cqlsh:billing> select count(*) from collected_data_day where
collection_day=16462;
count
-------
55341
(1 rows)
cqlsh:billing> select count(*) from collected_data_day where
collection_day=16462;
count
-------
55372
(1 rows)
cqlsh:billing> select count(*) from collected_data_day where
collection_day=16462;
count
-------
55300
(1 rows)
cqlsh:billing> select count(*) from collected_data_day where
collection_day=16462;
count
-------
55300
(1 rows)
cqlsh:billing> select count(*) from collected_data_day where
collection_day=16462;
count
-------
55300
(1 rows)
cqlsh:billing> select count(*) from collected_data_day where
collection_day=16462;
count
-------
55303
(1 rows)
cqlsh:billing> select count(*) from collected_data_day where
collection_day=16462;
count
-------
55374
(1 rows)
I am running the query from the seed node of the cassandra cluster. As you can
see most of the results are varying and I don't know the reason for this. We
are not writing anything into the cluster at this time , we are only querying
the cluster and only using this CQLSH.
This is very similar to CASSANDRA-8940 but that is targeted for 2.1x
Could it be that we are having the same issue in 3.3 ?
Please let me know what extra info I can provide
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)