[ 
https://issues.apache.org/jira/browse/CASSANDRA-19097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810720#comment-17810720
 ] 

Berenguer Blasi commented on CASSANDRA-19097:
---------------------------------------------

Please check my reasoning:

CASSANDRA_TOKEN_PREGENERATION_DISABLED is a CCM optimization for speed and not 
a user facing/product feature. There's no mention of it in the C* code, only in 
CCM. There are multiple code snippets and comments in CCM already i.e. 
[here|https://github.com/riptano/ccm/blob/514e3d828a0593fa31c65aff731a8c1eeb87c4e1/ccmlib/cluster.py#L529]
 and 
[here|https://github.com/riptano/ccm/blob/514e3d828a0593fa31c65aff731a8c1eeb87c4e1/ccmlib/cluster.py#L296]
 which do already protect against this problem coming from the past. The 
mention of token collisions "gives identical tokens to several nodes" is 
exactly what can be seen in the logs:

{noformat}
INFO  [GossipStage:1] 2024-01-18 10:53:33,257 StorageService.java:3139 - Nodes 
/127.0.0.2:7000 and /127.0.0.3:7000 have the same token -1513687318797282039.  
Ignoring /127.0.0.2:7000
INFO  [GossipStage:1] 2024-01-18 10:53:33,257 StorageService.java:3139 - Nodes 
/127.0.0.2:7000 and /127.0.0.3:7000 have the same token -2578028308570345435.  
Ignoring /127.0.0.2:7000
INFO  [GossipStage:1] 2024-01-18 10:53:33,257 StorageService.java:3139 - Nodes 
/127.0.0.2:7000 and /127.0.0.3:7000 have the same token -3836448899713998469.  
Ignoring /127.0.0.2:7000
INFO  [GossipStage:1] 2024-01-18 10:53:33,257 StorageService.java:3139 - Nodes 
/127.0.0.2:7000 and /127.0.0.3:7000 have the same token -5958143799716233579.  
Ignoring /127.0.0.2:7000
INFO  [GossipStage:1] 2024-01-18 10:53:33,257 StorageService.java:3139 - Nodes 
/127.0.0.2:7000 and /127.0.0.3:7000 have the same token -7015520316645191796.  
Ignoring /127.0.0.2:7000
INFO  [GossipStage:1] 2024-01-18 10:53:33,257 StorageService.java:3139 - Nodes 
/127.0.0.2:7000 and /127.0.0.3:7000 have the same token -8365722143219869680.  
Ignoring /127.0.0.2:7000
INFO  [GossipStage:1] 2024-01-18 10:53:33,257 StorageService.java:3139 - Nodes 
/127.0.0.2:7000 and /127.0.0.3:7000 have the same token -8792008644500999070.  
Ignoring /127.0.0.2:7000
INFO  [GossipStage:1] 2024-01-18 10:53:33,257 StorageService.java:3139 - Nodes 
/127.0.0.2:7000 and /127.0.0.3:7000 have the same token 2183381311695650387.  
Ignoring /127.0.0.2:7000
INFO  [GossipStage:1] 2024-01-18 10:53:33,257 StorageService.java:3139 - Nodes 
/127.0.0.2:7000 and /127.0.0.3:7000 have the same token 306316982182664358.  
Ignoring /127.0.0.2:7000
INFO  [GossipStage:1] 2024-01-18 10:53:33,258 StorageService.java:3139 - Nodes 
/127.0.0.2:7000 and /127.0.0.3:7000 have the same token 3352336234874664766.  
Ignoring /127.0.0.2:7000
INFO  [GossipStage:1] 2024-01-18 10:53:33,258 StorageService.java:3139 - Nodes 
/127.0.0.2:7000 and /127.0.0.3:7000 have the same token 4401460035203855854.  
Ignoring /127.0.0.2:7000
INFO  [GossipStage:1] 2024-01-18 10:53:33,258 StorageService.java:3139 - Nodes 
/127.0.0.2:7000 and /127.0.0.3:7000 have the same token 5337373207995371274.  
Ignoring /127.0.0.2:7000
INFO  [GossipStage:1] 2024-01-18 10:53:33,258 StorageService.java:3139 - Nodes 
/127.0.0.2:7000 and /127.0.0.3:7000 have the same token 5753010867567011154.  
Ignoring /127.0.0.2:7000
INFO  [GossipStage:1] 2024-01-18 10:53:33,258 StorageService.java:3139 - Nodes 
/127.0.0.2:7000 and /127.0.0.3:7000 have the same token 6964691214238992299.  
Ignoring /127.0.0.2:7000
INFO  [GossipStage:1] 2024-01-18 10:53:33,258 StorageService.java:3139 - Nodes 
/127.0.0.2:7000 and /127.0.0.3:7000 have the same token 7362014066587705090.  
Ignoring /127.0.0.2:7000
INFO  [GossipStage:1] 2024-01-18 10:53:33,258 StorageService.java:3139 - Nodes 
/127.0.0.2:7000 and /127.0.0.3:7000 have the same token 8620434782533148178.  
Ignoring /127.0.0.2:7000
{noformat}

If we revert the CCM change we'll suffer from speed increases and we won't test 
the different pre/post TCM behaviours. Given the usage of 
CASSANDRA_TOKEN_PREGENERATION_DISABLED is pretty isolated and constrained 
mainly to the bootstrap test I propose we:
- Merge the current PR with the version guard Markus mentions where trunk would 
not be affected so it exercises TCM
- Given how isolated this property's usage is anybody adding a new test should 
find the added comments. If that were not the case we have now a Jira ticket 
that should pop up pretty quick on any search of the failure stack trace.

Wdyt?

> Test Failure: bootstrap_test.TestBootstrap.*
> --------------------------------------------
>
>                 Key: CASSANDRA-19097
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19097
>             Project: Cassandra
>          Issue Type: Bug
>          Components: CI
>            Reporter: Michael Semb Wever
>            Assignee: Berenguer Blasi
>            Priority: Urgent
>             Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>         Attachments: jenkinslogs.zip
>
>
> test_killed_wiped_node_cannot_join
> test_read_from_bootstrapped_node
> test_shutdown_wiped_node_cannot_join
> Seen in dtests_offheap in CASSANDRA-19034
> https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/258/workflows/cea7d697-ca33-40bb-8914-fb9fc662820a/jobs/21162/parallel-runs/38
> {noformat}
> self = <bootstrap_test.TestBootstrap object at 0x7fc471171d50>
>     def test_killed_wiped_node_cannot_join(self):
> >       self._wiped_node_cannot_join_test(gently=False)
> bootstrap_test.py:608: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> self = <bootstrap_test.TestBootstrap object at 0x7fc471171d50>, gently = False
>     def _wiped_node_cannot_join_test(self, gently):
>         """
>         @jira_ticket CASSANDRA-9765
>         Test that if we stop a node and wipe its data then the node cannot 
> join
>         when it is not a seed. Test both a nice shutdown or a forced 
> shutdown, via
>         the gently parameter.
>         """
>         cluster = self.cluster
>         
> cluster.set_environment_variable('CASSANDRA_TOKEN_PREGENERATION_DISABLED', 
> 'True')
>         cluster.populate(3)
>         cluster.start()
>     
>         stress_table = 'keyspace1.standard1'
>     
>         # write some data
>         node1 = cluster.nodelist()[0]
>         node1.stress(['write', 'n=10K', 'no-warmup', '-rate', 'threads=8'])
>     
>         session = self.patient_cql_connection(node1)
>         original_rows = list(session.execute("SELECT * FROM 
> {}".format(stress_table,)))
>     
>         # Add a new node, bootstrap=True ensures that it is not a seed
>         node4 = new_node(cluster, bootstrap=True)
>         node4.start(wait_for_binary_proto=True)
>     
>         session = self.patient_cql_connection(node4)
> >       assert original_rows == list(session.execute("SELECT * FROM 
> > {}".format(stress_table,)))
> E       assert [Row(key=b'PP...e9\xbb'), ...] == [Row(key=b'PP...e9\xbb'), 
> ...]
> E         At index 587 diff: Row(key=b'OP2656L630', 
> C0=b"E02\xd2\x8clBv\tr\n\xe3\x01\xdd\xf2\x8a\x91\x7f-\x9dm'\xa5\xe7PH\xef\xc1xlO\xab+d",
>  
> C1=b"\xb2\xc0j\xff\xcb'\xe3\xcc\x0b\x93?\x18@\xc4\xc7tV\xb7q\xeeF\x82\xa4\xd3\xdcFl\xd9\x87
>  \x9a\xde\xdc\xa3", 
> C2=b'\xed\xf8\x8d%\xa4\xa6LPs;\x98f\xdb\xca\x913\xba{M\x8d6XW\x01\xea-\xb5<J\x1eo\xa0F\xbe',
>  
> C3=b'\x9ec\xcf\xc7\xec\xa5\x85Z]\xa6\x19\xeb\xc4W\x1d%lyZj\xb9\x94I\x90\xebZ\xdba\xdd\xdc\x9e\x82\x95\x1c',
>  
> C4=b'\xab\x9e\x13\x8b\xc6\x15D\x9b\xccl\xdcX\xb23\xd0\x8b\xa3\xba7\xc1c\xf7F\x1d\xf8e\xbd\x89\xcb\xd8\xd1)f\xdd')
>  != Row(key=b'4LN78NONP0', 
> C0=b"\xdf\x90\xb3/u\xc9/C\xcdOYG3\x070@#\xc3k\xaa$M'\x19\xfb\xab\xc0\x10]\xa6\xac\x1d\x81\xad",
>  
> C1=b'\x8a\xb7j\x95\xf9\xbd?&\x11\xaaH\xcd\x87\xaa\xd2\x85\x08X\xea9\x94\xae8U\x92\xad\xb0\x1b9\xff\x87Z\xe81',
>  
> C2=b'6\x1d\xa1-\xf77\xc7\xde+`\xb7\x89\xaa\xcd\xb5_\xe5\xb3\x04\xc7\xb1\x95e\x81s\t1\x8b\x16sc\x0eMm',
>  
> C3=b'\xfbi\x08;\xc9\x94\x15}r\xfe\x1b\xae5\xf6v\x83\xae\xff\x82\x9b`J\xc2D\xa6k+\xf3\xd3\xff{C\xd0;',
>  
> C4=b'\x8f\x87\x18\x0f\xfa\xadK"\x9e\x96\x87:tiu\xa5\x99\xe1_Ax\xa3\x12\xb4Z\xc9v\xa5\xad\xb8{\xc0\xa3\x93')
> E         Left contains 2830 more items, first extra item: 
> Row(key=b'5N7N172K30', 
> C0=b'Y\x81\xa6\x02\x89\xa0hyp\x00O\xe9kFp$\x86u\xea\n\x7fK\x99\xe1\xf6G\xf77\xf7\xd7\xe1\xc7L\x...0\x87a\x03\xee',
>  
> C4=b'\xe8\xd8\x17\xf3\x14\x16Q\x9d\\jb\xde=\x81\xc1B\x9c;T\xb1\xa2O-\x87zF=\x04`\x04\xbd\xc9\x95\xad')
> E         Full diff:
> E           [
> …
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to