Understanding SELECT * paging/ordering

2016-03-19 Thread Dan Checkoway
Say I have a table with 50M rows in a keyspace with RF=3 in a cluster of 15
nodes (single local data center).  When I do "SELECT * FROM table" and page
through those results (with a fetch size of say 1000), I'd like to
understand better how that paging works.

Specifically, what determines the order in which which rows are returned?
And what's happening under the hood...i.e. is the coordinator fetching
pages of 1000 from each node, passing some sort of paging state to each
node, and the coordinator merges the per-node sorted result sets?

I thought maybe the results would be sorted by partition key, but that
doesn't seem to be the case (I'm not 100% sure about this).

I'm also curious how consistency level comes into play.  i.e. if I use ONE
vs. QUORUM vs. ALL, how that impacts where the results come from and how
they're ordered, merged, and who knows what else I don't know...  :-)

Very curious how this works.  Thanks in advance!


Re: Understanding SELECT * paging/ordering

2016-03-18 Thread Tyler Hobbs
On Fri, Mar 18, 2016 at 4:58 PM, Dan Checkoway  wrote:

> Say I have a table with 50M rows in a keyspace with RF=3 in a cluster of
> 15 nodes (single local data center).  When I do "SELECT * FROM table" and
> page through those results (with a fetch size of say 1000), I'd like to
> understand better how that paging works.
>
> Specifically, what determines the order in which which rows are returned?
>

Results are returned in token order (murmur3 hash of the partition key),
and within a single partition, rows are ordered by the clustering key.


>   And what's happening under the hood...i.e. is the coordinator fetching
> pages of 1000 from each node, passing some sort of paging state to each
> node, and the coordinator merges the per-node sorted result sets?
>

The coordinator sequentially[1] queries each token range until it has
enough rows to meet the page size.  When the next page is fetched, it
resumes this process, but starts at the last-used token (which is in the
paging state that the driver passes to the coordinator) rather than the
start of the ring.


> I'm also curious how consistency level comes into play.  i.e. if I use ONE
> vs. QUORUM vs. ALL, how that impacts where the results come from and how
> they're ordered, merged, and who knows what else I don't know...  :-)
>

The only difference between ONE and QUORUM is that the coordinator will
query multiple replicas for each token range and perform the standard
conflict resolution.

[1] In reality, based on estimates of how many token ranges it will need to
query in order to meet the page size, it will query multiple token ranges
in parallel.  See CASSANDRA-1337 for details.

-- 
Tyler Hobbs
DataStax