[
https://issues.apache.org/jira/browse/CASSANDRA-20470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937711#comment-17937711
]
Nadav Har'El commented on CASSANDRA-20470:
------------------------------------------
As I said above, cql3/validation/operations/SelectTest.java has many tests that
assume that SELECT without ORDER BY returns a specific order and not an
unspecified order. I think the most enlightening example of why this assumption
is usefu is in testClusteringOrderWithSlice():
This test creates a table with a descending WITH CLUSTERING ORDER BY, and then
runs a SELECT query against this table without an explicit "ORDER BY". The
expectation of the test is that the SELECT without ORDER BY will return the
same on-disk order defined at table creation time (the descending order). The
author of this test obviously assumed that the SELECT order is guaranteed to be
the same on-disk order defined at table creation time, and an explicit ORDER BY
is _not needed_ just to request this default order.
> Document SELECT order without ORDER BY
> --------------------------------------
>
> Key: CASSANDRA-20470
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20470
> Project: Apache Cassandra
> Issue Type: Improvement
> Components: Documentation
> Reporter: Nadav Har'El
> Priority: Normal
>
> In SQL, doing a "SELECT" without an ORDER BY does not guarantee any order of
> the results.
> I've been using Cassandra for many years and until today, I was under the
> impression that in Cassandra, things are different: I thought that SELECT
> officially defaults to the "standard" order which sorts partitions according
> to tokens (or the partition key in case of tie) and inside a partition rows
> are sorted according to the clustering key(s). I was under the impression
> that Cassandra users rarely, if ever, use "ORDER BY ASC(c)" in each SELECT
> query - they just leave it out when wanting the default sort order.
> I think this is impression is also hard-coded in some of the unit tests. For
> example, cql3/validation/operations/SelectTest.java checks the result order
> with various ORDER BY, but its first check actually omits the ORDER BY and
> verifies that the results come in ascending clustering key order.
> But today I realized that this isn't documented anywhere I can find! I
> couldn't find any explicit documentation of what happens in a SELECT
> *without* an "ORDER BY". Is the order completely unspecified (as in SQL) or
> can users expect the default sort order is the "standard" one which I
> mentioned above? My request in this issue is to make the answer to this
> question clear and explicit in the documentation.
> Given that Cassandra already goes to all the trouble to guarantee a specific
> order of the rows on disk (and the on-disk part *is* documented), it seems
> strange not to make the most efficient way to read it - in that same order -
> be the default, and document that it is the default. But if this just "seems"
> to be the default but for some reason is *not* the default, then this fact
> should be documented, so users won't rely on it. Although to be honest, I'll
> be surprised if most Cassandra users don't rely on this already - it seems I
> always did... If this default sort order also can't be relied on, probably
> the unit tests that assume it will need to be changed, to use
> assertRowsIgnoringOrder() or something.
> Interestingly, DynamoDB which whose data model is more-or-less identical to
> Cassandra's, made the explicit decision that the default sort order always
> applies to queries: Their clustering key is called a "sort key" and rows are
> always sorted by it - e.g.,
> [https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html]
> says "Query results are always sorted by the sort key value.".
> Beyond the above, I think we also need to document how sort order applies
> when SELECT uses a secondary index. When the query is restricted by an
> indexed column, an explicit "ORDER BY" is outright forbidden (the error
> message is "ORDER BY with 2ndary indexes is not supported, except for ANN
> queries."), but more importantly it seems that although "usually" the
> standard default sort order is maintained in such queries, this is not
> guaranteed if one node returns a long list of results and the coordinator
> decides not to even try other nodes. I think this fact can cause unpleasant
> surprises for application developers - results will be sorted as expected
> when testing on a a small installation, but when moving to a large many-node
> cluster, suddenly the sort order will no longer be guaranteed. I think this
> should be documented as well.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]