[ 
https://issues.apache.org/jira/browse/CASSANDRA-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13671413#comment-13671413
 ] 

Sylvain Lebresne commented on CASSANDRA-5286:
---------------------------------------------

I had a quick look at the design document and I went over the initial patch 
quickly. The patch is kind of big so I haven't looked at all the details tbh.  
But the overall design looks very reasonable to me, though I did noticed the 
following points/random suggestions :
* I don't think we should use CFPair/CFPath in new codes. We should use the 
cfId everywhere instead (I mean, let's keep CFPair where it is in repair for 
now, but let's not use it anywhere).
* Not sure about when reporting is updated on reads. On the read side it's 
after every row. But what if you have a 1GB sized rows? We'd probably want to 
call the reporting method at every X bytes like we do on writes (where we call 
it at least every DEFAULT_CHUNK_SIZE bytes I believe).
* I would create a StreamException (that can have few subclasses). And I would 
have StreamOperation start return a Future<Void> but potentially throwing said 
StreamException. The reason is that exceptions allows to carry information on 
what the problem was. We would also use them in events.  Typically 
SessionCompleteEvent could allow to get the actual exception rather than just a 
success boolean.
* I don't think we support adding new ranges to fetch/transfer once the stream 
operation is started. So to enfore that with the API, I would probably create 
some StreamingPlan class (a kind of builder) that would have the requestRanges 
and transferRanges methods, and then its start() method would actually start 
the transfer (asynchronously) and return the StreamOperation, that could 
directly implement Future<Void> (I'd suggest making it extend guava's 
AbstactFuture for instance).
* Shouldn't progress reporting work at a lower level that the row level one on 
the read side? What if you have a 1GB sized rows? Especially since on the write 
side it does work at the bytes level.
* I'm not totaly sold on StreamMessageListener. Only StreamSession implements 
it, and between that, StreamSession, ConnectionHandler and MessageHandlers, 
it's a bit too hard to follow what is doing what due to too much inderection 
imo.  Just having StreamSession have a onStreamMessage() (and a 
reportProgress(), which was weird in StreamMessageListener in the first place 
anyway since it doesn't correspond to a message) and let it do it's internal 
dispatch would be simpler imo.
* It would also feel more natural to me to onFileSend() in 
OutgoingMessageHandler directly (to keep things related to writing to the 
socket in OutgoingMessageHandler), and to have the StreamWriter callback some 
StreamSession.onFileSent() when done.
* In the same spirit of simplification, I would kill the ProgressSupport class, 
and have the reader/writer call directly a
{noformat}
StreamSession.reportProgress(Direction in, Descriptor desc, long currentBytes, 
long totalBytes);
{noformat}
where Direction can be a simple IN/OUT enum in StreamSession. And let 
StreamSession decide if it's worth firing up an event or not.
* Currently StreamOperation is not really asynchronous, right?.
* Nit: I think we could simplify sligtly SessionCompleteEvent and 
SessionPreparedEvent. Since they already ship the operationInfo, we could just 
have them have a sessionId. You would then get the SessionInfo doing 
{{event.operationInfo.sessions.get(event.sessionId)}} (provided the latter 
sessions is a map).
* Nit: In StreamSession, I'd use a State enum rather than 3 booleans (prepared, 
...).

bq. Still not versioned yet, but definitely we should

Agreed.

bq. I think we also want to support the ability to stream different versions of 
SSTable

I also think that would be nice, but I'm happy leaving that to a follow up 
ticket honestly. As long as we ship the sstable version number, we can add that 
later on.
                
> Streaming 2.0
> -------------
>
>                 Key: CASSANDRA-5286
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5286
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Yuki Morishita
>              Labels: streaming
>             Fix For: 2.0
>
>
> 2.0 is the good time to redesign streaming API including protocol to make 
> streaming more performant and reliable.
> Design goals that come up in my mind:
> *Better performance*
>   - Protocol optimization
>   - Stream multiple files in parallel (CASSANDRA-4663)
>   - Persistent connection (CASSANDRA-4660)
> *Better control*
>   - Cleaner API for error handling
>   - Integrate both IN/OUT streams into one session, so the 
> components(bootstrap, move, bulkload, repair...) that use streaming can 
> manage them easily.
> *Better reporting*
>   - Better logging/tracing
>   - More metrics
>   - Progress reporting API for external client

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to