That does sound troubling. You mentioned you're reading at local quorum. Did
you write these control records at quorum, or from the same dc at local quorum?
What CL/DC are the other records written at?
On May 17, 2017 at 10:16:42 AM, Dominic Chevalier (dccheval...@gmail.com) wrote:
Hi Folks,
I've been noticing some missing rows, any where from 20-40% missing, while
executing paging queries over my cluster.
Basically the query is to hit every row, subdividing the entire token range
into a few tens of token ranges to parallelize the work, there is no wrap
around involved, at local_quorum:
select * from cf where token(primaryKey) > minimum and token(primaryKey) <
maximum;
I have inserted a test-control data set of 100,000 records, among billions of
live records. The control data set does not change, does not TTL, and queries
for individual rows at local_quorum return nearly all of the data, so it's very
strange paging queries consistently return 60-80% of what I expect. In the
past, paging queries have returned almost all of the control data set, and
still do in smaller test clusters.
My suspicion is something in the cluster state is impacting these results, but
I have yet to pin point anything. Nor have I been able to pinpoint what in the
past lead from consistently 100% paging coverage to consistently a lot less
than 100% coverage.
My cluster is Apache Cassandra 2.1.15, with approximately 100 nodes in the
local data center. Java driver version 3.1.0.
Thank you,
Dominic