Hi Oleksandr,

if bootstrap is disabled, it will only skip the streaming phase but will
still go through token allocation and thus should use the new algorithm.
The algorithm won't try to spread data based on size on disk but it will
try to spread token ownership as evenly as possible.

The problem you'll run into is that ownership for a specific keyspace will
be null as long as the replication strategy isn't updated to create
replicas on the new DC.
Quickly thinking would make me do the following :

   - Create enough nodes in the new DC to match the target replication
   factor
   - Alter the replication strategy to add the target number of replicas in
   the new DC (they will start getting writes, and hopefully you've already
   segregated reads)
   - Continue adding nodes in the new DC (with auto_bootstrap = false),
   specifying the right keyspace to optimize token allocations
   - Run rebuild on all nodes in the new DC

I honestly never used it but that's my understanding of how it should work.

Cheers,


On Tue, Jan 16, 2018 at 3:51 PM Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

> Hello,
>
> We want to add a new rack to an existing cluster (a new Availability Zone
> on AWS).
>
> Currently we have 12 nodes in 2 racks with ~4 TB data per node.  We also
> want to have bigger number of smaller nodes.  In order to minimize the
> streaming we want to add a new DC which will span 3 racks and then
> decommission the old DC.
>
> Following the documented procedure we are going to create all nodes in the
> new DC with auto_bootstrap=false and a distinct dc_suffix.  Then we are
> going to run `nodetool rebuild OLD_DC` on every node.
>
> Since we are observing some uneven load distribution in the old DC, we
> wanted to make use of new token allocation algorithm of Cassandra 3.0+ when
> building the new DC.
>
> To our understanding, this is currently not supported, because the new
> algorithm can only be used during proper node bootstrap?
>
> In theory it should still be possible to allocate tokens in the new DC by
> telling Cassandra which keyspace to optimize for and from which remote DC
> the data will be streamed ultimately, or am I missing something?
>
> Reading through the original implementation ticket I didn't find any
> reference to interaction with rebuild:
> https://issues.apache.org/jira/browse/CASSANDRA-7032
> Nor do I find any open tickets that would discuss the topic.
>
> Is it reasonable to open an issue for that or is there some obvious
> blocker?
>
> Thanks,
> --
> Alex
>
>

-- 
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Reply via email to