[ 
https://issues.apache.org/jira/browse/CASSANDRA-4663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796141#comment-15796141
 ] 

Jason Brown commented on CASSANDRA-4663:
----------------------------------------

[~iksaif] Unfortunately, I believe this patch isn't going to do what you want. 
In the existing code, {{connectionsPerHost}} is passed down to 
{{StreamCoordinator}}, where it is primarily going to be used on the side that 
is transferring files 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/streaming/StreamCoordinator.java#L189.
 In {{StreamCoordinator#sliceSSTableDetails()}}, we essentially round robin the 
sstable chunks across the number of connections we will use. It's also 
referenced from {{HostStreamingData#getOrCreateNextSession()}}, but that's not 
going to increase the inbound sessions (see next paragraph).

Your current patch sets the {{connectionsPerHost}} at the receiver end of the 
stream (even though it would be the node that is initiating the stream 
session). As it's a bootstrapping node that is requesting the ranges from the 
peer, and given the structure and protocol of the current stream session 
implementation (session and protocol are essentially sequential, single 
threaded, and expect a single socket), I think you'd need to get down into 
session and protocol management code (and muck with a whole lot) in order to to 
parallelize from the (non-initiating) sending node. 

I'm happy to be wrong about my understanding here, so if you've tested it out 
and have seen good gains, please share :).

> Streaming sends one file at a time serially. 
> ---------------------------------------------
>
>                 Key: CASSANDRA-4663
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4663
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Priority: Minor
>             Fix For: 3.x
>
>         Attachments: 
> 0001-streaming-add-a-way-to-configure-the-number-of-conne.patch
>
>
> This is not fast enough when someone is using SSD and may be 10G link. We 
> should try to create multiple connections and send multiple files in 
> parallel. 
> Current approach under utilize the link(even 1G).
> This change will improve the bootstrapping time of a node. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to