[
https://issues.apache.org/jira/browse/CASSANDRA-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250217#comment-15250217
]
Sylvain Lebresne commented on CASSANDRA-11521:
----------------------------------------------
bq. the protocol is very wasteful for the cases where you stream all the data
While I agree that it's probably time to think about optimizing this further, I
don't think it's specific to streaming so I'm in favor of just optimizing the
format itself in general, and I've created CASSANDRA-11622 for that. I
acknowledge that there may be some possible optimizations that would only
provide gains when you're guaranteed to send large amount of data, but
optimizing the format in general feels like a better first step in any case
since it's more generally useful.
bq. there is so much more we can do, in general, to make streaming faster, if
we go for something purpose-built instead
Making something purpose-built almost always allows for more optimization. But
it also means more complexity, a completely new mechanism for driver authors
and more code to maintain in general. I'm also not entirely convinced there is
_that_ much it would allow over the "hint" idea (of course, how you value
trade-offs between performance versus complexity is always somewhat
subjective). In particular, I want to note that the "hint" would clearly mean
that you intend to read it all and so we can still do a bunch of optimizations
on that assumption. Like having those query not pollute our future user-space
page cache, and maybe have the server start serializing at least one page in
advance optimistically.
I also want to note that reusing the paging mechanism gives us fail-over for
pretty much free (as in, almost no additional work from drivers) which is nice.
And adding cancellation (which I agree would be nice) is also pretty simple.
Anyway, all this to say that I feel this "hint" idea would give a lot of the
benefits for a lot less complexity (especially factoring the work required for
all drivers). So while I'm curious to see some of the numbers Stefania is still
working on, I (for what it's worth) really like the idea of starting with that
simple idea and then focusing on other (non strictly protocol related) idea
like CASSANDRA-11622 and CASSANDRA-11520. And only then re-evaluate if more
complexity is justified/desirable.
> Implement streaming for bulk read requests
> ------------------------------------------
>
> Key: CASSANDRA-11521
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11521
> Project: Cassandra
> Issue Type: Sub-task
> Components: Local Write-Read Paths
> Reporter: Stefania
> Assignee: Stefania
> Fix For: 3.x
>
>
> Allow clients to stream data from a C* host, bypassing the coordination layer
> and eliminating the need to query individual pages one by one.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)