[ 
https://issues.apache.org/jira/browse/CASSANDRA-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250217#comment-15250217
 ] 

Sylvain Lebresne commented on CASSANDRA-11521:
----------------------------------------------

bq. the protocol is very wasteful for the cases where you stream all the data

While I agree that it's probably time to think about optimizing this further, I 
don't think it's specific to streaming so I'm in favor of just optimizing the 
format itself in general, and I've created CASSANDRA-11622 for that. I 
acknowledge that there may be some possible optimizations that would only 
provide gains when you're guaranteed to send large amount of data, but 
optimizing the format in general feels like a better first step in any case 
since it's more generally useful.

bq. there is so much more we can do, in general, to make streaming faster, if 
we go for something purpose-built instead

Making something purpose-built almost always allows for more optimization. But 
it also means more complexity, a completely new mechanism for driver authors 
and more code to maintain in general. I'm also not entirely convinced there is 
_that_ much it would allow over the "hint" idea (of course, how you value 
trade-offs between performance versus complexity is always somewhat 
subjective). In particular, I want to note that the "hint" would clearly mean 
that you intend to read it all and so we can still do a bunch of optimizations 
on that assumption. Like having those query not pollute our future user-space 
page cache, and maybe have the server start serializing at least one page in 
advance optimistically.

I also want to note that reusing the paging mechanism gives us fail-over for 
pretty much free (as in, almost no additional work from drivers) which is nice. 
And adding cancellation (which I agree would be nice) is also pretty simple.

Anyway, all this to say that I feel this "hint" idea would give a lot of the 
benefits for a lot less complexity (especially factoring the work required for 
all drivers). So while I'm curious to see some of the numbers Stefania is still 
working on, I (for what it's worth) really like the idea of starting with that 
simple idea and then focusing on other (non strictly protocol related) idea 
like CASSANDRA-11622 and CASSANDRA-11520. And only then re-evaluate if more 
complexity is justified/desirable.

> Implement streaming for bulk read requests
> ------------------------------------------
>
>                 Key: CASSANDRA-11521
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11521
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Local Write-Read Paths
>            Reporter: Stefania
>            Assignee: Stefania
>             Fix For: 3.x
>
>
> Allow clients to stream data from a C* host, bypassing the coordination layer 
> and eliminating the need to query individual pages one by one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to