Re: Missing results during range query paging

2017-05-18 Thread Blake Eggleston
That does sound troubling. You mentioned you're reading at local quorum. Did 
you write these control records at quorum, or from the same dc at local quorum? 
What CL/DC are the other records written at?

On May 17, 2017 at 10:16:42 AM, Dominic Chevalier (dccheval...@gmail.com) wrote:

Hi Folks, 

I've been noticing some missing rows, any where from 20-40% missing, while 
executing paging queries over my cluster. 

Basically the query is to hit every row, subdividing the entire token range 
into a few tens of token ranges to parallelize the work, there is no wrap 
around involved, at local_quorum:

select * from cf where token(primaryKey) > minimum and token(primaryKey) < 
maximum; 

I have inserted a test-control data set of 100,000 records, among billions of 
live records. The control data set does not change, does not TTL, and queries 
for individual rows at local_quorum return nearly all of the data, so it's very 
strange paging queries consistently return 60-80% of what I expect. In the 
past, paging queries have returned almost all of the control data set, and 
still do in smaller test clusters. 

My suspicion is something in the cluster state is impacting these results, but 
I have yet to pin point anything. Nor have I been able to pinpoint what in the 
past lead from consistently 100% paging coverage to consistently a lot less 
than 100% coverage.

My cluster is Apache Cassandra 2.1.15, with approximately 100 nodes in the 
local data center. Java driver version 3.1.0.

Thank you,
Dominic



Missing results during range query paging

2017-05-17 Thread Dominic Chevalier
Hi Folks,

I've been noticing some missing rows, any where from 20-40% missing, while
executing paging queries over my cluster.

Basically the query is to hit every row, subdividing the entire token range
into a few tens of token ranges to parallelize the work, there is no wrap
around involved, at local_quorum:

select * from cf where token(primaryKey) > minimum and token(primaryKey) <
maximum;

I have inserted a test-control data set of 100,000 records, among billions
of live records. The control data set does not change, does not TTL, and
queries for individual rows at local_quorum return nearly all of the data,
so it's very strange paging queries consistently return 60-80% of what I
expect. In the past, paging queries have returned almost all of the control
data set, and still do in smaller test clusters.

My suspicion is something in the cluster state is impacting these results,
but I have yet to pin point anything. Nor have I been able to pinpoint what
in the past lead from consistently 100% paging coverage to consistently a
lot less than 100% coverage.

My cluster is Apache Cassandra 2.1.15, with approximately 100 nodes in the
local data center. Java driver version 3.1.0.

Thank you,
Dominic