Yes you are right, it limit how much data a node will send while streaming
data (repair, boostrap etc) total to other node, so that is does not affec
this node performance.

Boostraping is initiated by the boostraping node itself, which determine,
based on his token, which nodes to ask data from, then it compute its
"streaming plan" and init every session at the same time  (look
at org.apache.cassandra.streaming.StreamResultFuture#init )

But I can see in the code  that connection for session are done
sequentially in a FixedPoolSize executor limited by the node number of
processor, so, if I understand correctly the code, this might be the limit
you might be looking for.

You should have at the same time a limited ongoing streaming session
because of that limit, but I have to admit that because of all the async
method/callback in the code I might be wrong :(


On 20 February 2018 at 14:08, Jürgen Albersdorfer <jalbersdor...@gmail.com>
wrote:

> Hi Nicolas,
> I have seen that ' stream_throughput_outbound_megabits_per_sec', but
> afaik this limits what each node will provide at a maximum.
> What I'm more concerned of is the vast amount of connections to handle and
> the concurrent threads of which at least two get started for every single
> streaming connection.
> I'm a former Java Developer and I know that Threads are expensive even to
> just have them sitting around. JVM is doing fine handling some hundreds of
> threads, but what about some thousands?
>
> And about the cleanup - we are currently massively scaling out a startup
> database. I thought of doing cleanup after that ramp-up phase when scaling
> slows down to maybe a node every 3 to 7 days in average.
> So for now I want to add 32 Machines at one after one and then care about
> the cleanup afterwards one by one.
>
> regards,
> Jürgen
>
> 2018-02-20 13:56 GMT+01:00 Nicolas Guyomar <nicolas.guyo...@gmail.com>:
>
>> Hi Jurgen,
>>
>> stream_throughput_outbound_megabits_per_sec is the "given total
>> throughput in Mbps", so it does limit the "concurrent throughput" IMHO,
>> is it not what you are looking for?
>>
>> The only limits I can think of are :
>> - number of connection between every node and the one boostrapping
>> - number of pending compaction (especially if you have lots of
>> keyspace/table) that could lead to some JVM problem maybe ?
>>
>> Anyway, because while bootstrapping, a node is not accepting reads,
>> configuration like compactionthroughput, concurrentcompactor and
>> streamingthroughput can be set on the fly using nodetool, so you can
>> quickly ajust them
>>
>> Out of curiosity, do you run "nodetool cleanup" in parallel on every
>> nodes left after a boostrap, or do you spread the "cleanup load" ? I have
>> not seen yet one adding a node every day like this ;) have fun !
>>
>>
>>
>> On 20 February 2018 at 13:42, Jürgen Albersdorfer <
>> jalbersdor...@gmail.com> wrote:
>>
>>> Hi, I'm wondering if it is possible resp. would it make sense to limit
>>> concurrent streaming when joining a new node to cluster.
>>>
>>> I'm currently operating a 15-Node C* Cluster (V 3.11.1) and joining
>>> another Node every day.
>>> The 'nodetool netstats' shows it always streams data from all other
>>> nodes.
>>>
>>> How far will this scale? - What happens when I have hundrets or even
>>> thousends of Nodes?
>>>
>>> Has anyone experience with such a Situation?
>>>
>>> Thanks, and regards
>>> Jürgen
>>>
>>
>>
>

Reply via email to