[
https://issues.apache.org/jira/browse/CASSANDRA-6722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13905561#comment-13905561
]
Sylvain Lebresne commented on CASSANDRA-6722:
---------------------------------------------
Good catch. We can't indeed properly do post-query reordering in general if we
page. Note that currently, there is 2 cases where we do post-query reordering:
# with a IN on the partition key and an ORDER BY (the example above).
# if there is an IN on the last clustering column of a compact table.
The 2nd case is actually a bit weird. The reason we order post-query is so the
resultset follows the order of the IN in the query, i.e. if you have the table
above but COMPACT and do
{noformat}
SELECT value FROM paging_test WHERE id=1 AND value IN ('b', 'a', 'c')
{noformat}
you will get [ 'b', 'a', 'c' ] in that order as a result set. So far, why not,
but there is 2 problems in practice:
* we only do that for compact tables. For non-compact ones, we just return
results in clustering order (we don't do the post-query ordering). That
inconsistency is an historical accident.
* this actually take precedence over ORDER BY, which is completely broken. I.e.
even if you add 'ORDER BY value' in the query above, the results will not be
properly ordered.
* this is even more broken than that: CASSANDRA-6701.
This case is a mess. So because it's a problem in the context of this ticket
and because there is no reason to guarantee any special ordering of the result
set if there is no ORDER BY, I suggest we just remove the behavior for compact
storage (it was an implementation detail so far) to make it in line with the
non-compact case, thus avoiding us to have to deal with it here.
Back to the more interesting case of the example in the description: a IN on
the partition key with ORDER BY. In that case, we could almost support paging
properly: the reason it's broken is that the pager queries partitions of the IN
one after the other. But the pager could, in theory, page over each partition
simultaneously, querying them all little by little and doing a merge sort.
It is however not all that easy in practice. If you query a full page of each
partition and there is many partitions in the IN, you'll load tons of data in
memory, defeating in large parts the goal of paging. If you instead query less
than the page size of each partition, you now may need to re-query some of the
partitions depending on what the merge sort yield on those first pages. Not
only would that require serious refactor of the current code to properly
handle, but it's rather unclear how efficient this would really be in general.
I'm not sure it's really worth it in the end.
But in any case, such solution is way out of scope for 2.0 (and probably even
for 2.1 at this point). So we need a quick solution for now and it probably
mean that we should throw an IRE if the query requires post-query paging and
paging is on.
Taking a step back, I wonder if allowing post-query reordering by default was a
good idea, and if the same way we have ALLOW FILTERING, we shouldn't have
required an ALLOW IN-MEMORY REORDERING for such queries. Of course, it's harder
to change that now, but maybe we could still do it by adding such flag,
deprecate queries that don't use it by logging a warning like we've done in
CASSANDRA-6649 for 1 or 2 versions, and forbid it completely afterwards?
> cross-partition ordering should have warning or be disallowed when paging
> -------------------------------------------------------------------------
>
> Key: CASSANDRA-6722
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6722
> Project: Cassandra
> Issue Type: Bug
> Reporter: Russ Hatch
> Assignee: Sylvain Lebresne
> Priority: Minor
> Fix For: 2.0.6
>
> Attachments: 6722.txt
>
>
> consider this schema/data/query:
> {noformat}
> CREATE TABLE paging_test (
> id int,
> value text,
> PRIMARY KEY (id, value)
> ) WITH CLUSTERING ORDER BY (value ASC)
> |id|value|
> |1 |a |
> |2 |b |
> |1 |c |
> |2 |d |
> |1 |e |
> |2 |f |
> |1 |g |
> |2 |h |
> |1 |i |
> |2 |j |
> select * from paging_test where id in (1,2) order by value asc;
> {noformat}
> When paging the above query I get the sorted results from id=1 first, then
> the sorted results from id=2 after that. I was testing this because I was
> curious if the paging system could somehow globally sort the results but it
> makes sense that we can't do that, since that would require all results to be
> collated up front.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)