[
https://issues.apache.org/jira/browse/CASSANDRA-11195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15455573#comment-15455573
]
Benjamin Lerer commented on CASSANDRA-11195:
--------------------------------------------
The problem occurs only when static columns are involved and when the
coordinator is a 3.x version querying a 2.x one.
If 2 partitions are within the same token range on the 2.x node and that the
previous page ended up at the end of the first partition, for the next page the
node will return an extra row for the first partition containing only static
data. That row was then ignored by the 3.x coordinator but as one row was
missing from the page the coordinator thought that all the data had been
returned.
I created a patch that make 3.x version request the same data as the 2.x.
Meaning include the last row returned and request pageSize + 1 rows.
I pushed some patch for
[3.0|https://github.com/blerer/cassandra/tree/11195-3.0] and
[3.9|https://github.com/blerer/cassandra/tree/11195-3.9]. They looks ok from
the CI point of view but we should test them with the upgradetests that do not
seems to run on CI.
[~rhatch] could you test if the patch solves the problem with virtual nodes. To
be honest, I do not think that it is the same problem as the chances of having
2 partitions on the same token range with virtual nodes are pretty small.
> paging may returns incomplete results on small page size
> --------------------------------------------------------
>
> Key: CASSANDRA-11195
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11195
> Project: Cassandra
> Issue Type: Bug
> Reporter: Jim Witschey
> Assignee: Benjamin Lerer
> Labels: dtest
> Attachments: allfiles.tar.gz, node1.log, node1_debug.log, node2.log,
> node2_debug.log
>
>
> This was found through a flapping test, and running that test is still the
> easiest way to repro the issue. On CI we're seeing a 40-50% failure rate, but
> locally this test fails much less frequently.
> If I attach a python debugger and re-query the "bad" query, it continues to
> return incomplete data indefinitely. If I go directly to cqlsh I can see all
> rows just fine.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)