[
https://issues.apache.org/jira/browse/CASSANDRA-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032195#comment-16032195
]
Jason Brown commented on CASSANDRA-12229:
-----------------------------------------
Pushed more changes onto the shame branch (sha
{{e3dd956fadb8084babdb2d96e222a3b2949589bd}})
- removed waitUntilWritable() from {{ByteBufDataOutputStreamPlus}} and replaced
with a simple semaphore.
- cleaned up the close/cancel notification in {{StreamingInboundHander}} from
the {{thread#interrupt()}} to using an atomic variable (polling it from the
background thread).
- corrected and optimized the way ByteBufs and ByteBuffers are used in the
stream classes
- restore CASSANDRA-7585 and bring back StreamingConnectionFactory, with
modified versions of it's implementations and then inject the factory where we
used to do it.
- move stream versioning back into StreamMessage, instead of StreamSession
(where I had moved it)
- minor clean ups and doc additions
- addressed TODO's and error handling
Here are a few responses to some of the concerns raised by [~aweisberg] from
the current PR.
bq. This is really a generic inbound message handler right? Are there
potentially races here with multiple threads interacting with
StreamResultFuture and StreamManager and the global state? Can they register
different state and then end up with a different StreamResultFuture or session?
I don't believe this to be the case as the intiator sends out the
{{StreamInitMessage}} and subsequent control messages on the same (control)
channel, so you have TCP ordering guarantees there. Further, most of the
important, state-changing functions related to {{StreamSession}} are already
{{synchronized}}.
bq. Ideally I think each stream session would get a dedicated control channel
and a dedicated actor thread to orchestrate control messages. Completely
decouple the whole thing and turn into a queue of events being processed
serially. The only concurrency and parallelism is then files being transferred
in parallel which is pretty clearly shared nothing other than resource limits
which are generally pretty easy to reason about since there is no risk of
deadlock.
The trick with that is we would need to send session identifier info in every
message so that the correct {{StreamSession}} can be identified. Otherwise, you
have to stick a reference to the {{StreamSession}} somewhere: in the channel
attributes, in the {{StreamInboundHandler}}, or in the deserialization task
(like what I currently have). Further, I'm now trying to avoid altering the
stream message format/protocol, so long as correctness is maintained, because
there's a boat load of other changes happening here.
bq. ... checksums ...
I did not change the checksum algorithm for the LZ4 compression on internode
messaging as that would be a change to the (undocumented) internode messaging
protocol, and would break upgrades. Further, Sylvain didn't want to change the
internode messaging protocol for this release, so I think we're stuck with the
32-bit checksumming as that's what we have in the existing
[{{OutboundTcpConnection}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L530].
> Move streaming to non-blocking IO and netty (streaming 2.1)
> -----------------------------------------------------------
>
> Key: CASSANDRA-12229
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12229
> Project: Cassandra
> Issue Type: Improvement
> Components: Streaming and Messaging
> Reporter: Jason Brown
> Assignee: Jason Brown
> Fix For: 4.0
>
>
> As followup work to CASSANDRA-8457, we need to move streaming to use netty.
> Streaming 2.0 (CASSANDRA-5286) brought many good improvements to how files
> are transferred between nodes in a cluster. However, the low-level details of
> the current streaming implementation does not line up nicely with a
> non-blocking model, so I think this is a good time to review some of those
> details and add in additional goodness. The current implementation assumes a
> sequential or "single threaded" approach to the sending of stream messages as
> well as the transfer of files. In short, after several iterative prototypes,
> I propose the following:
> 1) use a single bi-diredtional connection (instead of requiring to two
> sockets & two threads)
> 2) send the "non-file" {{StreamMessage}} s (basically anything not
> {{OutboundFileMessage}}) via the normal internode messaging. This will
> require a slight bit more management of the session (the ability to look up a
> {{StreamSession}} from a static function on {{StreamManager}}, but we have
> have most of the pieces we need for this already.
> 3) switch to a non-blocking IO model (facilitated via netty)
> 4) Allow files to be streamed in parallel (CASSANDRA-4663) - this should just
> be a thing already
> 5) If the entire sstable is to streamed, in addition to the DATA component,
> transfer all the components of the sstable (primary index, bloom filter,
> stats, and so on). This way we can avoid the CPU and GC pressure from
> deserializing the stream into objects. File streaming then amounts to a
> block-level transfer.
> Note: The progress/results of CASSANDRA-11303 will need to be reflected here,
> as well.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]