[ 
https://issues.apache.org/jira/browse/CASSANDRA-18935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790581#comment-17790581
 ] 

Stefan Miklosovic edited comment on CASSANDRA-18935 at 11/28/23 1:50 PM:
-------------------------------------------------------------------------

It seems that the byteman script we inject into node 1 and 3 in 
test_counter_leader_with_partial does not work. If you look closely, the rule 
is trying to intercept a method called "findSuitableEndpoint" (1) but there is 
not such method in 4.0 included. So how does this work then, huh ... 

findSuitableEndpoint is present only in 3.0 and 3.11. For these versions, there 
is this byteman injected (2).

So, it seems like it does its job for 3.0 and 3.11 only. For 4.0 included it 
does not inject anything. 

The question is, if it is not injected, how is it possible that the test still 
passes anyway? Well, if it is not injected, then it goes by the logic currently 
present in 4.0+, which is - replica which is going to be a leader will be only 
a node for which RPC is true and it is not marked as dead by FailureDetector. 
So it seems like when we stop the second node in (3), two remaining nodes will 
detect this and they will never send any mutation to that node because it is 
filtered out and byteman script which forces to select it if it is not filtered 
out does not work anyway. So the node with partial view of the cluster is never 
selected.

I think we need to do this to fix the tests everywhere 
[https://github.com/apache/cassandra-dtest/pull/246] (this will go together 
with 19103)

The byteman injection after we stopped the node is not necessary in trunk 
because there is TCM machinery in place.

(1) 
[https://github.com/apache/cassandra-dtest/blob/trunk/byteman/4.0/election_counter_leader_favor_node2.btm#L8]
(2) 
[https://github.com/apache/cassandra-dtest/blob/trunk/byteman/pre4.0/election_counter_leader_favor_node2.btm#L8]
(3) 
[https://github.com/apache/cassandra-dtest/blob/trunk/counter_test.py#L117-L119]
(3) 
[https://github.com/apache/cassandra-dtest/blob/trunk/counter_test.py#L110-L132]
(4) 
[https://github.com/apache/cassandra-dtest/blob/trunk/byteman/4.0/election_counter_leader_favor_node2.btm#L10]
(5) [https://github.com/apache/cassandra-dtest/blob/trunk/counter_test.py#L117]


was (Author: smiklosovic):
It seems that the byteman script we inject into node 1 and 3 in 
test_counter_leader_with_partial does not work. If you look closely, the rule 
is trying to intercept a method called "findSuitableEndpoint" (1) but there is 
not such method in 4.0 included. So how does this work then, huh ... 

findSuitableEndpoint is present only in 3.0 and 3.11. For these versions, there 
is this byteman injected (2).

So, it seems like it does its job for 3.0 and 3.11 only. For 4.0 included it 
does not inject anything. 

The question is, if it is not injected, how is it possible that the test still 
passes anyway? Well, if it is not injected, then it goes by the logic currently 
present in 4.0+, which is - replica which is going to be a leader will be only 
a node for which RPC is true and it is not marked as dead by FailureDetector. 
So it seems like when we stop the second node in (3), two remaining nodes will 
detect this and they will never send any mutation to that node because it is 
filtered out and byteman script which forces to select it if it is not filtered 
out does not work anyway. So the node with partial view of the cluster is never 
selected.

I think we need to do this to fix the tests everywhere 
[https://github.com/apache/cassandra-dtest/pull/246] (this will got together 
with 19103)

The byteman injection after we stopped the node is not necessary in trunk 
because there is TCM machinery in place.

(1) 
[https://github.com/apache/cassandra-dtest/blob/trunk/byteman/4.0/election_counter_leader_favor_node2.btm#L8]
(2) 
[https://github.com/apache/cassandra-dtest/blob/trunk/byteman/pre4.0/election_counter_leader_favor_node2.btm#L8]
(3) 
[https://github.com/apache/cassandra-dtest/blob/trunk/counter_test.py#L117-L119]
(3) 
[https://github.com/apache/cassandra-dtest/blob/trunk/counter_test.py#L110-L132]
(4) 
[https://github.com/apache/cassandra-dtest/blob/trunk/byteman/4.0/election_counter_leader_favor_node2.btm#L10]
(5) [https://github.com/apache/cassandra-dtest/blob/trunk/counter_test.py#L117]

> Fix nodetool enable/disablebinary to correctly set rpc
> ------------------------------------------------------
>
>                 Key: CASSANDRA-18935
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18935
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Legacy/Core, Legacy/CQL
>            Reporter: Cameron Zemek
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>             Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x
>
>         Attachments: 18935-3.11.patch, image-2023-11-16-10-56-16-693.png, 
> signature.asc
>
>          Time Spent: 2.5h
>  Remaining Estimate: 0h
>
>  
> {code:java}
>         if ((nativeFlag != null && Boolean.parseBoolean(nativeFlag)) || 
> (nativeFlag == null && DatabaseDescriptor.startNativeTransport()))
>         {
>             startNativeTransport();
>             StorageService.instance.setRpcReady(true);
>         } {code}
> The startup code here only sets RpcReady if native transport is enabled. If 
> you call 
> {code:java}
> nodetool enablebinary{code}
> then this flag doesn't get set.
> But with the change from CASSANDRA-13043 it requires RpcReady set to true in 
> order to get a leader for the counter update.
> Not sure what the correct fix is here, seems to only really use this flag for 
> counters. So thinking perhaps the fix is to just move this outside the if 
> condition.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to