I've been noticing some missing rows, any where from 20-40% missing, while
executing paging queries over my cluster.
Basically the query is to hit every row, subdividing the entire token range
into a few tens of token ranges to parallelize the work, there is no wrap
around involved, at local_quorum:
select * from cf where token(primaryKey) > minimum and token(primaryKey) <
I have inserted a test-control data set of 100,000 records, among billions
of live records. The control data set does not change, does not TTL, and
queries for individual rows at local_quorum return nearly all of the data,
so it's very strange paging queries consistently return 60-80% of what I
expect. In the past, paging queries have returned almost all of the control
data set, and still do in smaller test clusters.
My suspicion is something in the cluster state is impacting these results,
but I have yet to pin point anything. Nor have I been able to pinpoint what
in the past lead from consistently 100% paging coverage to consistently a
lot less than 100% coverage.
My cluster is Apache Cassandra 2.1.15, with approximately 100 nodes in the
local data center. Java driver version 3.1.0.