Re: Streaming from 1 node only when adding a new DC

2016-06-16 Thread Fabien Rousseau
Thanks,

Created the issue: https://issues.apache.org/jira/browse/CASSANDRA-12015

2016-06-15 15:25 GMT+02:00 Paulo Motta :

> For rebuild, replace and -Dcassandra.consistent.rangemovement=false in
> general we currently pick the closest replica (as indicated by the Snitch)
> which has the range, what will often map to the same node due to the
> dynamic snitch, specially when N=RF. This is good for picking a node in the
> same DC or rack for transferring, but we can probably improve this to
> distribute streaming load more evenly within candidate source nodes in the
> same rack/DC.
>
> Would you mind opening a ticket for improving this?
>
>
> 2016-06-14 17:35 GMT-03:00 Fabien Rousseau :
>
>> We've tested with C* 2.1.14 version
>> Yes VNodes with 256 tokens
>> Once all the nodes in dc2 are added, schema is modified to have RF=3 in
>> dc1 and RF=3 in dc2.
>> Then on each nodes of dc2:
>> nodetool rebuild dc1
>> Le 14 juin 2016 10:39, "kurt Greaves"  a écrit :
>>
>>> What version of Cassandra are you using? Also what command are you using
>>> to run the rebuilds? Are you using vnodes?
>>>
>>> On 13 June 2016 at 09:01, Fabien Rousseau  wrote:
>>>
 Hello,

 We've tested adding a new DC from an existing DC having 3 nodes and
 RF=3 (ie all nodes have all data).
 During the rebuild process, only one node of the first DC streamed data
 to the 3 nodes of the second DC.

 Our goal is to minimise the time it takes to rebuild a DC and would
 like to be able to stream from all nodes.

 Starting C* with debug logs, it appears that all nodes, when computing
 their "streaming plan" returns the same node for all ranges.
 This is probably because all nodes in DC2 have the same view of the
 ring.

 I understand that when bootstrapping a new node, it's preferable to
 stream from the node being replaced, but when rebuilding a new DC, it
 should probably select sources "randomly" (rather than always selecting the
 same source for a specific range).
 What do you think ?

 Best Regards,
 Fabien

>>>
>>>
>>>
>>> --
>>> Kurt Greaves
>>> k...@instaclustr.com
>>> www.instaclustr.com
>>>
>>
>


Re: Streaming from 1 node only when adding a new DC

2016-06-15 Thread Paulo Motta
For rebuild, replace and -Dcassandra.consistent.rangemovement=false in
general we currently pick the closest replica (as indicated by the Snitch)
which has the range, what will often map to the same node due to the
dynamic snitch, specially when N=RF. This is good for picking a node in the
same DC or rack for transferring, but we can probably improve this to
distribute streaming load more evenly within candidate source nodes in the
same rack/DC.

Would you mind opening a ticket for improving this?

2016-06-14 17:35 GMT-03:00 Fabien Rousseau :

> We've tested with C* 2.1.14 version
> Yes VNodes with 256 tokens
> Once all the nodes in dc2 are added, schema is modified to have RF=3 in
> dc1 and RF=3 in dc2.
> Then on each nodes of dc2:
> nodetool rebuild dc1
> Le 14 juin 2016 10:39, "kurt Greaves"  a écrit :
>
>> What version of Cassandra are you using? Also what command are you using
>> to run the rebuilds? Are you using vnodes?
>>
>> On 13 June 2016 at 09:01, Fabien Rousseau  wrote:
>>
>>> Hello,
>>>
>>> We've tested adding a new DC from an existing DC having 3 nodes and RF=3
>>> (ie all nodes have all data).
>>> During the rebuild process, only one node of the first DC streamed data
>>> to the 3 nodes of the second DC.
>>>
>>> Our goal is to minimise the time it takes to rebuild a DC and would like
>>> to be able to stream from all nodes.
>>>
>>> Starting C* with debug logs, it appears that all nodes, when computing
>>> their "streaming plan" returns the same node for all ranges.
>>> This is probably because all nodes in DC2 have the same view of the ring.
>>>
>>> I understand that when bootstrapping a new node, it's preferable to
>>> stream from the node being replaced, but when rebuilding a new DC, it
>>> should probably select sources "randomly" (rather than always selecting the
>>> same source for a specific range).
>>> What do you think ?
>>>
>>> Best Regards,
>>> Fabien
>>>
>>
>>
>>
>> --
>> Kurt Greaves
>> k...@instaclustr.com
>> www.instaclustr.com
>>
>


Re: Streaming from 1 node only when adding a new DC

2016-06-14 Thread Fabien Rousseau
We've tested with C* 2.1.14 version
Yes VNodes with 256 tokens
Once all the nodes in dc2 are added, schema is modified to have RF=3 in dc1
and RF=3 in dc2.
Then on each nodes of dc2:
nodetool rebuild dc1
Le 14 juin 2016 10:39, "kurt Greaves"  a écrit :

> What version of Cassandra are you using? Also what command are you using
> to run the rebuilds? Are you using vnodes?
>
> On 13 June 2016 at 09:01, Fabien Rousseau  wrote:
>
>> Hello,
>>
>> We've tested adding a new DC from an existing DC having 3 nodes and RF=3
>> (ie all nodes have all data).
>> During the rebuild process, only one node of the first DC streamed data
>> to the 3 nodes of the second DC.
>>
>> Our goal is to minimise the time it takes to rebuild a DC and would like
>> to be able to stream from all nodes.
>>
>> Starting C* with debug logs, it appears that all nodes, when computing
>> their "streaming plan" returns the same node for all ranges.
>> This is probably because all nodes in DC2 have the same view of the ring.
>>
>> I understand that when bootstrapping a new node, it's preferable to
>> stream from the node being replaced, but when rebuilding a new DC, it
>> should probably select sources "randomly" (rather than always selecting the
>> same source for a specific range).
>> What do you think ?
>>
>> Best Regards,
>> Fabien
>>
>
>
>
> --
> Kurt Greaves
> k...@instaclustr.com
> www.instaclustr.com
>


Re: Streaming from 1 node only when adding a new DC

2016-06-14 Thread kurt Greaves
What version of Cassandra are you using? Also what command are you using to
run the rebuilds? Are you using vnodes?

On 13 June 2016 at 09:01, Fabien Rousseau  wrote:

> Hello,
>
> We've tested adding a new DC from an existing DC having 3 nodes and RF=3
> (ie all nodes have all data).
> During the rebuild process, only one node of the first DC streamed data to
> the 3 nodes of the second DC.
>
> Our goal is to minimise the time it takes to rebuild a DC and would like
> to be able to stream from all nodes.
>
> Starting C* with debug logs, it appears that all nodes, when computing
> their "streaming plan" returns the same node for all ranges.
> This is probably because all nodes in DC2 have the same view of the ring.
>
> I understand that when bootstrapping a new node, it's preferable to stream
> from the node being replaced, but when rebuilding a new DC, it should
> probably select sources "randomly" (rather than always selecting the same
> source for a specific range).
> What do you think ?
>
> Best Regards,
> Fabien
>



-- 
Kurt Greaves
k...@instaclustr.com
www.instaclustr.com