So one time I tried to understand why only a single node could have a
token, and it appeared that it came over the fence from facebook and has
been kept ever since. Personally I don't think it's necessary, and agree
that it is kind of problematic (but there's probably lot's of stuff that
relies on this now). Multiple DC's is one example but the same could apply
to racks. There's no real reason (with NTS) that two nodes in separate
racks can't have the same token. In fact being able to do this would make
token allocation much simpler, and smart allocation algorithms could work
much better with vnodes.

On 1 February 2018 at 17:35, Oleksandr Shulgin <oleksandr.shul...@zalando.de
> wrote:

> On Thu, Feb 1, 2018 at 5:19 AM, Jeff Jirsa <jji...@gmail.com> wrote:
>
>>
>>> The reason I find it surprising, is that it makes very little *sense* to
>>> put a token belonging to a mode from one DC between tokens of nodes from
>>> another one.
>>>
>>
>> I don't want to really turn this into an argument over what should and
>> shouldn't make sense, but I do agree, it doesn't make sense to put a token
>> on one node in one DC onto another node in another DC.
>>
>
> This is not what I was trying to say.  I should have used an example to
> express myself clearer.  Here goes (disclaimer: it might sound like a rant,
> take it with a grain of salt):
>
> $ ccm create -v 3.0.15 -n 3:3 -s 2dcs
>
> For a more meaningful multi-DC setup than the default SimpleStrategy, use
> NTS:
>
> $ ccm node1 cqlsh -e "ALTER KEYSPACE system_auth WITH replication =
> {'class': 'NetworkTopologyStrategy', 'dc1': 2, 'dc2': 2};"
>
> $ ccm node1 nodetool ring
>
> Datacenter: dc1
> ==========
> Address    Rack        Status State   Load            Owns
> Token
>
> 3074457345618258602
> 127.0.0.1  r1          Up     Normal  117.9 KB        66.67%
> -9223372036854775808
> 127.0.0.2  r1          Up     Normal  131.56 KB       66.67%
> -3074457345618258603
> 127.0.0.3  r1          Up     Normal  117.88 KB       66.67%
> 3074457345618258602
>
> Datacenter: dc2
> ==========
> Address    Rack        Status State   Load            Owns
> Token
>
> 3074457345618258702
> 127.0.0.4  r1          Up     Normal  121.54 KB       66.67%
> -9223372036854775708
> 127.0.0.5  r1          Up     Normal  118.59 KB       66.67%
> -3074457345618258503
> 127.0.0.6  r1          Up     Normal  114.12 KB       66.67%
> 3074457345618258702
>
> Note that CCM is aware of the cross-DC clashes and selects the tokens for
> the dc2 shifted by a 100.
>
> Then look at the token ring (output abbreviated and aligned by me):
>
> $ ccm node1 nodetool describering system_auth
>
> Schema Version:4f7d0ad0-350d-3ea0-ae8b-53d5bc34fc7e
> TokenRange:
> TokenRange(start_token:-9223372036854775808,
> end_token:-9223372036854775708, endpoints:[127.0.0.4, 127.0.0.2,
> 127.0.0.5, 127.0.0.3], ... TokenRange(start_token:-9223372036854775708,
> end_token:-3074457345618258603, endpoints:[127.0.0.2, 127.0.0.5,
> 127.0.0.3, 127.0.0.6], ... TokenRange(start_token:-3074457345618258603,
> end_token:-3074457345618258503, endpoints:[127.0.0.5, 127.0.0.3,
> 127.0.0.6, 127.0.0.1], ...
> TokenRange(start_token:-3074457345618258503, end_token:
> 3074457345618258602, endpoints:[127.0.0.3, 127.0.0.6, 127.0.0.1,
> 127.0.0.4], ... TokenRange(start_token: 3074457345618258602, end_token:
> 3074457345618258702, endpoints:[127.0.0.6, 127.0.0.1, 127.0.0.4,
> 127.0.0.2], ...
> TokenRange(start_token: 3074457345618258702, end_token:-9223372036854775808,
> endpoints:[127.0.0.1, 127.0.0.4, 127.0.0.2, 127.0.0.5], ...
>
> So in this setup, every token range has one end contributed by a node from
> dc1 and the other end -- from dc2.  That doesn't model anything in the real
> topology of the cluster.
>
> I see that it's easy to lump together tokens from all nodes and sort them,
> to produce a single token ring (and this is obviously the reason why tokens
> have to be unique throughout the cluster as a whole).  That doesn't mean
> it's a meaningful thing to do.
>
> This introduces complexity which not present in the problem domain
> initially.  This was a deliberate choice of developers, dare I say, to
> complect the separate DCs together in a single token ring.  This has
> profound consequences from the operations side.  If anything, it prevents
> bootstrapping multiple nodes at the same time even if they are in different
> DCs.  Or would you suggest to set consistent_range_movement=false and
> hope it will work out?
>
> If the whole reason for having separate DCs is to provide isolation, I
> fail to see how the single token ring design does anything towards
> achieving that.
>
> But also being very clear (I want to make sure I understand what you're
>> saying): that's a manual thing you did, Cassandra didn't do it for you,
>> right? The fact that Cassandra didn't STOP you from doing it could be
>> considered a bug, but YOU made that config choice?
>>
>
> Yes, we have chosen exactly the same token for two nodes in different DCs
> because we were unaware of this globally uniqueness requirement.  Yes, we
> believe it's a bug that Cassandra didn't stop us from doing that.
>
> You can trivially predict what would happen with SimpleStrategy in
>> multi-DC: run nodetool ring, and the first RF nodes listed after a given
>> token own that data, regardless of which DC they're in. Because it's all
>> one big ring.
>
>
> In any case I don't think SimpleStrategy is a valid argument to consider
> in multi-DC setup.  It is true that you can start a cluster spanning
> multiple DCs from scratich while using SimpleStrategy, but there is no way
> to add a new DC to the cluster unless you go NTS, so why pulling this
> example?
>
> Cheers,
> --
> Alex
>
>

Reply via email to