On Thu, Feb 1, 2018 at 5:19 AM, Jeff Jirsa <jji...@gmail.com> wrote: > >> The reason I find it surprising, is that it makes very little *sense* to >> put a token belonging to a mode from one DC between tokens of nodes from >> another one. >> > > I don't want to really turn this into an argument over what should and > shouldn't make sense, but I do agree, it doesn't make sense to put a token > on one node in one DC onto another node in another DC. >
This is not what I was trying to say. I should have used an example to express myself clearer. Here goes (disclaimer: it might sound like a rant, take it with a grain of salt): $ ccm create -v 3.0.15 -n 3:3 -s 2dcs For a more meaningful multi-DC setup than the default SimpleStrategy, use NTS: $ ccm node1 cqlsh -e "ALTER KEYSPACE system_auth WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': 2, 'dc2': 2};" $ ccm node1 nodetool ring Datacenter: dc1 ========== Address Rack Status State Load Owns Token 3074457345618258602 127.0.0.1 r1 Up Normal 117.9 KB 66.67% -9223372036854775808 127.0.0.2 r1 Up Normal 131.56 KB 66.67% -3074457345618258603 127.0.0.3 r1 Up Normal 117.88 KB 66.67% 3074457345618258602 Datacenter: dc2 ========== Address Rack Status State Load Owns Token 3074457345618258702 127.0.0.4 r1 Up Normal 121.54 KB 66.67% -9223372036854775708 127.0.0.5 r1 Up Normal 118.59 KB 66.67% -3074457345618258503 127.0.0.6 r1 Up Normal 114.12 KB 66.67% 3074457345618258702 Note that CCM is aware of the cross-DC clashes and selects the tokens for the dc2 shifted by a 100. Then look at the token ring (output abbreviated and aligned by me): $ ccm node1 nodetool describering system_auth Schema Version:4f7d0ad0-350d-3ea0-ae8b-53d5bc34fc7e TokenRange: TokenRange(start_token:-9223372036854775808, end_token:-9223372036854775708, endpoints:[127.0.0.4, 127.0.0.2, 127.0.0.5, 127.0.0.3], ... TokenRange( start_token:-9223372036854775708, end_token:-3074457345618258603, endpoints:[127.0.0.2, 127.0.0.5, 127.0.0.3, 127.0.0.6], ... TokenRange(start_token:-3074457345618258603, end_token:-3074457345618258503, endpoints:[127.0.0.5, 127.0.0.3, 127.0.0.6, 127.0.0.1], ... TokenRange(start_token:-3074457345618258503, end_token: 3074457345618258602, endpoints:[127.0.0.3, 127.0.0.6, 127.0.0.1, 127.0.0.4], ... TokenRange(start_token: 3074457345618258602, end_token: 3074457345618258702, endpoints:[127.0.0.6, 127.0.0.1, 127.0.0.4, 127.0.0.2], ... TokenRange(start_token: 3074457345618258702, end_token:-9223372036854775808, endpoints:[127.0.0.1, 127.0.0.4, 127.0.0.2, 127.0.0.5], ... So in this setup, every token range has one end contributed by a node from dc1 and the other end -- from dc2. That doesn't model anything in the real topology of the cluster. I see that it's easy to lump together tokens from all nodes and sort them, to produce a single token ring (and this is obviously the reason why tokens have to be unique throughout the cluster as a whole). That doesn't mean it's a meaningful thing to do. This introduces complexity which not present in the problem domain initially. This was a deliberate choice of developers, dare I say, to complect the separate DCs together in a single token ring. This has profound consequences from the operations side. If anything, it prevents bootstrapping multiple nodes at the same time even if they are in different DCs. Or would you suggest to set consistent_range_movement=false and hope it will work out? If the whole reason for having separate DCs is to provide isolation, I fail to see how the single token ring design does anything towards achieving that. But also being very clear (I want to make sure I understand what you're > saying): that's a manual thing you did, Cassandra didn't do it for you, > right? The fact that Cassandra didn't STOP you from doing it could be > considered a bug, but YOU made that config choice? > Yes, we have chosen exactly the same token for two nodes in different DCs because we were unaware of this globally uniqueness requirement. Yes, we believe it's a bug that Cassandra didn't stop us from doing that. You can trivially predict what would happen with SimpleStrategy in > multi-DC: run nodetool ring, and the first RF nodes listed after a given > token own that data, regardless of which DC they're in. Because it's all > one big ring. In any case I don't think SimpleStrategy is a valid argument to consider in multi-DC setup. It is true that you can start a cluster spanning multiple DCs from scratich while using SimpleStrategy, but there is no way to add a new DC to the cluster unless you go NTS, so why pulling this example? Cheers, -- Alex