[
https://issues.apache.org/jira/browse/CASSANDRA-4663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796141#comment-15796141
]
Jason Brown commented on CASSANDRA-4663:
----------------------------------------
[~iksaif] Unfortunately, I believe this patch isn't going to do what you want.
In the existing code, {{connectionsPerHost}} is passed down to
{{StreamCoordinator}}, where it is primarily going to be used on the side that
is transferring files
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/streaming/StreamCoordinator.java#L189.
In {{StreamCoordinator#sliceSSTableDetails()}}, we essentially round robin the
sstable chunks across the number of connections we will use. It's also
referenced from {{HostStreamingData#getOrCreateNextSession()}}, but that's not
going to increase the inbound sessions (see next paragraph).
Your current patch sets the {{connectionsPerHost}} at the receiver end of the
stream (even though it would be the node that is initiating the stream
session). As it's a bootstrapping node that is requesting the ranges from the
peer, and given the structure and protocol of the current stream session
implementation (session and protocol are essentially sequential, single
threaded, and expect a single socket), I think you'd need to get down into
session and protocol management code (and muck with a whole lot) in order to to
parallelize from the (non-initiating) sending node.
I'm happy to be wrong about my understanding here, so if you've tested it out
and have seen good gains, please share :).
> Streaming sends one file at a time serially.
> ---------------------------------------------
>
> Key: CASSANDRA-4663
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4663
> Project: Cassandra
> Issue Type: Improvement
> Reporter: sankalp kohli
> Priority: Minor
> Fix For: 3.x
>
> Attachments:
> 0001-streaming-add-a-way-to-configure-the-number-of-conne.patch
>
>
> This is not fast enough when someone is using SSD and may be 10G link. We
> should try to create multiple connections and send multiple files in
> parallel.
> Current approach under utilize the link(even 1G).
> This change will improve the bootstrapping time of a node.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)