[ 
https://issues.apache.org/jira/browse/CASSANDRA-11126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-11126:
-----------------------------------------
    Status: Patch Available  (was: Awaiting Feedback)

bq. Note that the commands I provided end with the names of the tests to run

Oh, that's my bad. The formatting was such that I though this was just you 
copy-pasting the output of the test command to show it was running the test in 
question.

Anyway, I still needed to also run the source of the upgrade from a local repo 
to debug what was going on between the nodes (and the most convenient way to do 
so is to just have 2 different checkouts that the upgrade test directly uses 
without having to commit to a branch every time), so I went with my manual 
solution.

And so far, I've found 2 problems:
# the first seems to be a small logic issue when serializing distinct queries 
for old nodes. Namely, the {{CompositeToGroup}} we set depends on whether the 
query select static columns, but the condition for that was wrong, it was using 
a {{||}} when it should have been using a {{&&}}. As a result, we didn't set 
{{compositeToGroup = -2}} in case we should have (because the 2nd condition was 
almost always true).
# the second is when building paging state for protocol version 3. If the last 
of the page row, the one we use for the
# state, had no cells, we were improperly not sending a proper cellName, while 
2.x always expect one. What we should send in that case is the cellname of the 
row marker. That problem had 2 possible consequence, depending on to which node 
the "corrupted" paging state is sent:
** if to a 2.1/2.2 node, then it would throw the following exception:
{noformat}
java.lang.ClassCastException: 
org.apache.cassandra.db.composites.Composites$EmptyComposite cannot be cast to 
org.apache.cassandra.db.composites.CellName
        at 
org.apache.cassandra.db.composites.AbstractCellNameType.cellFromByteBuffer(AbstractCellNameType.java:188)
 ~[main/:na]
        at 
org.apache.cassandra.service.pager.RangeSliceQueryPager.<init>(RangeSliceQueryPager.java:64)
 ~[main/:na]
        at 
org.apache.cassandra.service.pager.QueryPagers.pager(QueryPagers.java:115) 
~[main/:na]
        at 
org.apache.cassandra.service.pager.QueryPagers.pager(QueryPagers.java:126) 
~[main/:na]
        at 
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:183)
 ~[main/:na]
        at 
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:76)
 ~[main/:na]
        ...
{noformat}
because what's send is not a proper cellname.
** if to a 3.x node, then it would misinterpret this, basically re-including 
the corresponding row even though it had been returned already, and that's the 
{{9 != 8}} error from the description.

| [11126-3.0|https://github.com/pcmanus/cassandra/commits/11126-3.0] | 
[utests|http://cassci.datastax.com/job/pcmanus-11126-3.0-testall] | 
[dtests|http://cassci.datastax.com/job/pcmanus-11126-3.0-dtest] |

The test seems to pass reliably on my local box with those fixes, but again, I 
didn't ran exactly the test from the upgrade tests.

> select_distinct_with_deletions_test failing on non-vnode environments
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-11126
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11126
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Ryan McGuire
>            Assignee: Sylvain Lebresne
>              Labels: dtest
>             Fix For: 3.0.x
>
>
> Looks like this was fixed in CASSANDRA-10762, but not for non-vnode 
> environments:
> {code}
> $ DISABLE_VNODES=yes KEEP_TEST_DIR=yes CASSANDRA_VERSION=git:cassandra-3.0 
> PRINT_DEBUG=true nosetests -s -v 
> upgrade_tests/cql_tests.py:TestCQLNodes2RF1.select_distinct_with_deletions_test
> select_distinct_with_deletions_test 
> (upgrade_tests.cql_tests.TestCQLNodes2RF1) ... cluster ccm directory: 
> /tmp/dtest-UXb0un
> http://git-wip-us.apache.org/repos/asf/cassandra.git git:cassandra-3.0
> Custom init_config not found. Setting defaults.
> Done setting configuration options:
> {   'num_tokens': None,
>     'phi_convict_threshold': 5,
>     'range_request_timeout_in_ms': 10000,
>     'read_request_timeout_in_ms': 10000,
>     'request_timeout_in_ms': 10000,
>     'truncate_request_timeout_in_ms': 10000,
>     'write_request_timeout_in_ms': 10000}
> getting default job version for 3.0.3
> UpgradePath(starting_version='binary:2.2.3', upgrade_version=None)
> starting from 2.2.3
> upgrading to {'install_dir': 
> '/home/ryan/.ccm/repository/gitCOLONcassandra-3.0'}
> Querying upgraded node
> FAIL
> ======================================================================
> FAIL: select_distinct_with_deletions_test 
> (upgrade_tests.cql_tests.TestCQLNodes2RF1)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "/home/ryan/git/datastax/cassandra-dtest/upgrade_tests/cql_tests.py", 
> line 3360, in select_distinct_with_deletions_test
>     self.assertEqual(9, len(rows))
> AssertionError: 9 != 8
> -------------------- >> begin captured logging << --------------------
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-UXb0un
> dtest: DEBUG: Custom init_config not found. Setting defaults.
> dtest: DEBUG: Done setting configuration options:
> {   'num_tokens': None,
>     'phi_convict_threshold': 5,
>     'range_request_timeout_in_ms': 10000,
>     'read_request_timeout_in_ms': 10000,
>     'request_timeout_in_ms': 10000,
>     'truncate_request_timeout_in_ms': 10000,
>     'write_request_timeout_in_ms': 10000}
> dtest: DEBUG: getting default job version for 3.0.3
> dtest: DEBUG: UpgradePath(starting_version='binary:2.2.3', 
> upgrade_version=None)
> dtest: DEBUG: starting from 2.2.3
> dtest: DEBUG: upgrading to {'install_dir': 
> '/home/ryan/.ccm/repository/gitCOLONcassandra-3.0'}
> dtest: DEBUG: Querying upgraded node
> --------------------- >> end captured logging << ---------------------
> ----------------------------------------------------------------------
> Ran 1 test in 56.022s
> FAILED (failures=1)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to