[
https://issues.apache.org/jira/browse/CASSANDRA-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008698#comment-16008698
]
Jason Brown commented on CASSANDRA-12229:
-----------------------------------------
[~aweisberg] created a
[PR|https://github.com/jasobrown/cassandra/pull/1/files], and added a bunch of
comments. I took his feedback, and created a new branch and a [new
PR|https://github.com/jasobrown/cassandra/pull/2/files] for comments.
Significant changes in this rev:
- Ariel suggested moving the disk IO off the event loop on the sending side,
and keep a blocking IO behavior for the disk reads. Doing this allowed me to go
back and reuse the {{StreamReader}}/{{StreamWriter}} set of classes. To achieve
the disk reads to happen on the event loop required some back flips, so
ditching that code is not a bad thing.
- While I was reverting back to the {{StreamReader}} classes, I could also
revert the {{StreamMessage}} changes.
Reverting back (and lightly modifying) those classes resulted in nearly the
same performance (and there's always more tuning to be done), with ~40%
reduction in the patch set from trunk.
A few oddities needs to be cleaned up:
- SwappingByteBufDataOutputStreamPlus - this is an experiment from a
experimental branch from CASSANDRA-8457. The basic idea for this class is
sound, but the naming and implementation might be a bit funky.
- restoring a few unit tests
- I've (temporariliy) removed the checksumming from
{{StreamCompressionSerializer}} as it does incur about a 30% performance
penalty on streaming uncompressed sstables. This cost might be covered over
once files can be parallel, but I've pulled it out for now and would like to
have a discussion on it.
> Move streaming to non-blocking IO and netty (streaming 2.1)
> -----------------------------------------------------------
>
> Key: CASSANDRA-12229
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12229
> Project: Cassandra
> Issue Type: Improvement
> Components: Streaming and Messaging
> Reporter: Jason Brown
> Assignee: Jason Brown
> Fix For: 4.0
>
>
> As followup work to CASSANDRA-8457, we need to move streaming to use netty.
> Streaming 2.0 (CASSANDRA-5286) brought many good improvements to how files
> are transferred between nodes in a cluster. However, the low-level details of
> the current streaming implementation does not line up nicely with a
> non-blocking model, so I think this is a good time to review some of those
> details and add in additional goodness. The current implementation assumes a
> sequential or "single threaded" approach to the sending of stream messages as
> well as the transfer of files. In short, after several iterative prototypes,
> I propose the following:
> 1) use a single bi-diredtional connection (instead of requiring to two
> sockets & two threads)
> 2) send the "non-file" {{StreamMessage}} s (basically anything not
> {{OutboundFileMessage}}) via the normal internode messaging. This will
> require a slight bit more management of the session (the ability to look up a
> {{StreamSession}} from a static function on {{StreamManager}}, but we have
> have most of the pieces we need for this already.
> 3) switch to a non-blocking IO model (facilitated via netty)
> 4) Allow files to be streamed in parallel (CASSANDRA-4663) - this should just
> be a thing already
> 5) If the entire sstable is to streamed, in addition to the DATA component,
> transfer all the components of the sstable (primary index, bloom filter,
> stats, and so on). This way we can avoid the CPU and GC pressure from
> deserializing the stream into objects. File streaming then amounts to a
> block-level transfer.
> Note: The progress/results of CASSANDRA-11303 will need to be reflected here,
> as well.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]