[ 
https://issues.apache.org/jira/browse/CASSANDRA-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392800#comment-15392800
 ] 

Paulo Motta commented on CASSANDRA-12229:
-----------------------------------------

It seems to me that sstable-based streaming will potentially have higher 
throughput than mutation-based streaming (since it skips sender ser/des 
overhead and potentially component construction if you transfer whole 
sstables), and may be preferable for bootstrapping specially of dense nodes in 
a first moment. While we want to ideally/eventually have a single path for 
bootstrap and repair streaming, from my understanding the main goal of 
CASSANDRA-8911 is to make repair more robust and efficient, while having a 
single path for that and bootstrap streaming would be a secondary goal to be 
pursued in a different ticket, so I don't see that as immediately superseding 
this.

With that said, while I agree we should synchronize between this and 
CASSANDRA-8911, I think we can still pursue both in parallel, specially if this 
is a requirement of CASSANDRA-8457, but initially focusing on porting the 
current protocol (1, 2, 3) to NIO and applying improvements where 
straightforward, while leaving more complex improvements (4, 5) for other 
tickets so we can re-evaluate them after having more progress on 
CASSANDRA-8911. Even if we advance with mutation-based streaming on 
CASSANDRA-8911 we can still use this to benchmark bootstrap performance with 
both approaches.

> Move streaming to non-blocking IO and netty (streaming 2.1)
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-12229
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12229
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Streaming and Messaging
>            Reporter: Jason Brown
>            Assignee: Jason Brown
>             Fix For: 4.0
>
>
> As followup work to CASSANDRA-8457, we need to move streaming to use netty.
> Streaming 2.0 (CASSANDRA-5286) brought many good improvements to how files 
> are transferred between nodes in a cluster. However, the low-level details of 
> the current streaming implementation does not line up nicely with a 
> non-blocking model, so I think this is a good time to review some of those 
> details and add in additional goodness. The current implementation assumes a 
> sequential or "single threaded" approach to the sending of stream messages as 
> well as the transfer of files. In short, after several iterative prototypes, 
> I propose the following:
> 1) use a single bi-diredtional connection (instead of requiring to two 
> sockets & two threads)
> 2) send the "non-file" {{StreamMessage}} s (basically anything not 
> {{OutboundFileMessage}}) via the normal internode messaging. This will 
> require a slight bit more management of the session (the ability to look up a 
> {{StreamSession}} from a static function on {{StreamManager}}, but we have 
> have most of the pieces we need for this already.
> 3) switch to a non-blocking IO model (facilitated via netty)
> 4) Allow files to be streamed in parallel (CASSANDRA-4663) - this should just 
> be a thing already
> 5) If the entire sstable is to streamed, in addition to the DATA component, 
> transfer all the components of the sstable (primary index, bloom filter, 
> stats, and so on). This way we can avoid the CPU and GC pressure from 
> deserializing the stream into objects. File streaming then amounts to a 
> block-level transfer.
> Note: The progress/results of CASSANDRA-11303 will need to be reflected here, 
> as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to