On Thu, Feb 1, 2018 at 5:19 AM, Jeff Jirsa <jji...@gmail.com> wrote:

>
>> The reason I find it surprising, is that it makes very little *sense* to
>> put a token belonging to a mode from one DC between tokens of nodes from
>> another one.
>>
>
> I don't want to really turn this into an argument over what should and
> shouldn't make sense, but I do agree, it doesn't make sense to put a token
> on one node in one DC onto another node in another DC.
>

This is not what I was trying to say.  I should have used an example to
express myself clearer.  Here goes (disclaimer: it might sound like a rant,
take it with a grain of salt):

$ ccm create -v 3.0.15 -n 3:3 -s 2dcs

For a more meaningful multi-DC setup than the default SimpleStrategy, use
NTS:

$ ccm node1 cqlsh -e "ALTER KEYSPACE system_auth WITH replication =
{'class': 'NetworkTopologyStrategy', 'dc1': 2, 'dc2': 2};"

$ ccm node1 nodetool ring

Datacenter: dc1
==========
Address    Rack        Status State   Load            Owns
Token

3074457345618258602
127.0.0.1  r1          Up     Normal  117.9 KB        66.67%
-9223372036854775808
127.0.0.2  r1          Up     Normal  131.56 KB       66.67%
-3074457345618258603
127.0.0.3  r1          Up     Normal  117.88 KB       66.67%
3074457345618258602

Datacenter: dc2
==========
Address    Rack        Status State   Load            Owns
Token

3074457345618258702
127.0.0.4  r1          Up     Normal  121.54 KB       66.67%
-9223372036854775708
127.0.0.5  r1          Up     Normal  118.59 KB       66.67%
-3074457345618258503
127.0.0.6  r1          Up     Normal  114.12 KB       66.67%
3074457345618258702

Note that CCM is aware of the cross-DC clashes and selects the tokens for
the dc2 shifted by a 100.

Then look at the token ring (output abbreviated and aligned by me):

$ ccm node1 nodetool describering system_auth

Schema Version:4f7d0ad0-350d-3ea0-ae8b-53d5bc34fc7e
TokenRange:
TokenRange(start_token:-9223372036854775808, end_token:-9223372036854775708,
endpoints:[127.0.0.4, 127.0.0.2, 127.0.0.5, 127.0.0.3], ... TokenRange(
start_token:-9223372036854775708, end_token:-3074457345618258603,
endpoints:[127.0.0.2, 127.0.0.5, 127.0.0.3, 127.0.0.6], ...
TokenRange(start_token:-3074457345618258603, end_token:-3074457345618258503,
endpoints:[127.0.0.5, 127.0.0.3, 127.0.0.6, 127.0.0.1], ...
TokenRange(start_token:-3074457345618258503, end_token:
3074457345618258602, endpoints:[127.0.0.3, 127.0.0.6, 127.0.0.1,
127.0.0.4], ... TokenRange(start_token: 3074457345618258602, end_token:
3074457345618258702, endpoints:[127.0.0.6, 127.0.0.1, 127.0.0.4,
127.0.0.2], ...
TokenRange(start_token: 3074457345618258702, end_token:-9223372036854775808,
endpoints:[127.0.0.1, 127.0.0.4, 127.0.0.2, 127.0.0.5], ...

So in this setup, every token range has one end contributed by a node from
dc1 and the other end -- from dc2.  That doesn't model anything in the real
topology of the cluster.

I see that it's easy to lump together tokens from all nodes and sort them,
to produce a single token ring (and this is obviously the reason why tokens
have to be unique throughout the cluster as a whole).  That doesn't mean
it's a meaningful thing to do.

This introduces complexity which not present in the problem domain
initially.  This was a deliberate choice of developers, dare I say, to
complect the separate DCs together in a single token ring.  This has
profound consequences from the operations side.  If anything, it prevents
bootstrapping multiple nodes at the same time even if they are in different
DCs.  Or would you suggest to set consistent_range_movement=false and hope
it will work out?

If the whole reason for having separate DCs is to provide isolation, I fail
to see how the single token ring design does anything towards achieving
that.

But also being very clear (I want to make sure I understand what you're
> saying): that's a manual thing you did, Cassandra didn't do it for you,
> right? The fact that Cassandra didn't STOP you from doing it could be
> considered a bug, but YOU made that config choice?
>

Yes, we have chosen exactly the same token for two nodes in different DCs
because we were unaware of this globally uniqueness requirement.  Yes, we
believe it's a bug that Cassandra didn't stop us from doing that.

You can trivially predict what would happen with SimpleStrategy in
> multi-DC: run nodetool ring, and the first RF nodes listed after a given
> token own that data, regardless of which DC they're in. Because it's all
> one big ring.


In any case I don't think SimpleStrategy is a valid argument to consider in
multi-DC setup.  It is true that you can start a cluster spanning multiple
DCs from scratich while using SimpleStrategy, but there is no way to add a
new DC to the cluster unless you go NTS, so why pulling this example?

Cheers,
--
Alex

Reply via email to