[
https://issues.apache.org/jira/browse/CASSANDRA-8087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tyler Hobbs updated CASSANDRA-8087:
-----------------------------------
Attachment: 8087-2.0.txt
The root of the problem ended up being {{countCQL3}} rows erroneously being set
to true in PagedRangeCommand because the logic in
{{PagedRangeCommand.countCQL3Rows()}} didn't handle the DISTINCT with static
columns case. In 2.1 this isn't a problem because we serialize
{{countCQL3Rows}} as part of the message.
The attached patch attempts to update the {{PagedRangeCommand.countCQL3Rows()}}
logic to handle DISTINCT with statics. I believe this logic is safe, but I'm
not 100% sure. (It doesn't seem to cause any regressions in the tests.) The
patch also adds a bit of documentation and some toStrings() to clarify things
that were confusing to me when debugging.
Last, the patch fixes an overcounting problem in
{{SliceQueryFilter.lastCounted()}}. This fix ended up not being required for
this ticket, but I figured that it's good to prevent a possible future bug.
The overcounting happens because in
{{SliceQueryFilter.collectReducedColumns()}}, we have to call
{{columnCounter.count()}} _before_ adding cells to the container, and we only
break once the count _exceeds_ the limit. So, if we exceed the limit, the
counter will have overcounted by one. In practice, this doesn't seem to cause
any problems (due to the conditions under which {{collectionReducedColumns()}}
is called and when we set a limit on the slice), but it's definitely erroneous.
I also extended the failing dtest here:
https://github.com/thobbs/cassandra-dtest/tree/CASSANDRA-8087
> Multiple non-DISTINCT rows returned when page_size set
> ------------------------------------------------------
>
> Key: CASSANDRA-8087
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8087
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Adam Holmberg
> Assignee: Tyler Hobbs
> Priority: Minor
> Fix For: 2.0.12
>
> Attachments: 8087-2.0.txt
>
>
> Using the following statements to reproduce:
> {code}
> CREATE TABLE test (
> k int,
> p int,
> s int static,
> PRIMARY KEY (k, p)
> );
> INSERT INTO test (k, p) VALUES (1, 1);
> INSERT INTO test (k, p) VALUES (1, 2);
> SELECT DISTINCT k, s FROM test ;
> {code}
> Native clients that set result_page_size in the query message receive
> multiple non-distinct rows back (one per clustered value p in row k).
> This is only reproduced on 2.0.10. Does not appear in 2.1.0
> It does not appear in cqlsh for 2.0.10 because thrift.
> See https://datastax-oss.atlassian.net/browse/PYTHON-164 for background
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)