[
https://issues.apache.org/jira/browse/CASSANDRA-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13671413#comment-13671413
]
Sylvain Lebresne commented on CASSANDRA-5286:
---------------------------------------------
I had a quick look at the design document and I went over the initial patch
quickly. The patch is kind of big so I haven't looked at all the details tbh.
But the overall design looks very reasonable to me, though I did noticed the
following points/random suggestions :
* I don't think we should use CFPair/CFPath in new codes. We should use the
cfId everywhere instead (I mean, let's keep CFPair where it is in repair for
now, but let's not use it anywhere).
* Not sure about when reporting is updated on reads. On the read side it's
after every row. But what if you have a 1GB sized rows? We'd probably want to
call the reporting method at every X bytes like we do on writes (where we call
it at least every DEFAULT_CHUNK_SIZE bytes I believe).
* I would create a StreamException (that can have few subclasses). And I would
have StreamOperation start return a Future<Void> but potentially throwing said
StreamException. The reason is that exceptions allows to carry information on
what the problem was. We would also use them in events. Typically
SessionCompleteEvent could allow to get the actual exception rather than just a
success boolean.
* I don't think we support adding new ranges to fetch/transfer once the stream
operation is started. So to enfore that with the API, I would probably create
some StreamingPlan class (a kind of builder) that would have the requestRanges
and transferRanges methods, and then its start() method would actually start
the transfer (asynchronously) and return the StreamOperation, that could
directly implement Future<Void> (I'd suggest making it extend guava's
AbstactFuture for instance).
* Shouldn't progress reporting work at a lower level that the row level one on
the read side? What if you have a 1GB sized rows? Especially since on the write
side it does work at the bytes level.
* I'm not totaly sold on StreamMessageListener. Only StreamSession implements
it, and between that, StreamSession, ConnectionHandler and MessageHandlers,
it's a bit too hard to follow what is doing what due to too much inderection
imo. Just having StreamSession have a onStreamMessage() (and a
reportProgress(), which was weird in StreamMessageListener in the first place
anyway since it doesn't correspond to a message) and let it do it's internal
dispatch would be simpler imo.
* It would also feel more natural to me to onFileSend() in
OutgoingMessageHandler directly (to keep things related to writing to the
socket in OutgoingMessageHandler), and to have the StreamWriter callback some
StreamSession.onFileSent() when done.
* In the same spirit of simplification, I would kill the ProgressSupport class,
and have the reader/writer call directly a
{noformat}
StreamSession.reportProgress(Direction in, Descriptor desc, long currentBytes,
long totalBytes);
{noformat}
where Direction can be a simple IN/OUT enum in StreamSession. And let
StreamSession decide if it's worth firing up an event or not.
* Currently StreamOperation is not really asynchronous, right?.
* Nit: I think we could simplify sligtly SessionCompleteEvent and
SessionPreparedEvent. Since they already ship the operationInfo, we could just
have them have a sessionId. You would then get the SessionInfo doing
{{event.operationInfo.sessions.get(event.sessionId)}} (provided the latter
sessions is a map).
* Nit: In StreamSession, I'd use a State enum rather than 3 booleans (prepared,
...).
bq. Still not versioned yet, but definitely we should
Agreed.
bq. I think we also want to support the ability to stream different versions of
SSTable
I also think that would be nice, but I'm happy leaving that to a follow up
ticket honestly. As long as we ship the sstable version number, we can add that
later on.
> Streaming 2.0
> -------------
>
> Key: CASSANDRA-5286
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5286
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Yuki Morishita
> Labels: streaming
> Fix For: 2.0
>
>
> 2.0 is the good time to redesign streaming API including protocol to make
> streaming more performant and reliable.
> Design goals that come up in my mind:
> *Better performance*
> - Protocol optimization
> - Stream multiple files in parallel (CASSANDRA-4663)
> - Persistent connection (CASSANDRA-4660)
> *Better control*
> - Cleaner API for error handling
> - Integrate both IN/OUT streams into one session, so the
> components(bootstrap, move, bulkload, repair...) that use streaming can
> manage them easily.
> *Better reporting*
> - Better logging/tracing
> - More metrics
> - Progress reporting API for external client
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira