[ 
https://issues.apache.org/jira/browse/CASSANDRA-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235496#comment-15235496
 ] 

Aleksey Yeschenko commented on CASSANDRA-11521:
-----------------------------------------------

[~brianmhess] Does C*-Spark integration use CL.LOCAL_ONE for reads? I know we 
do use QUORUM for writes, as a method for overload control.

A small hint on top of regular {{SELECT}} is a decent first step, but there is 
so much more we can do, in general, to make streaming faster, if we go for 
something purpose-built instead (even if built on top of Native protocol) - 
with proper support from the driver.

Among other things, the protocol is very wasteful for the cases where you 
stream all the data, especially if you have big partitions and a few clustering 
columns. While clustering column repetition as part of cell names is now fully 
gone from sstables and in-memory representation, in the protocol itself, with 
each row, we both repeat all the clustering columns - even if many rows share 
them - and the partition key columns. Could get rid of it, and all related 
redundant serialisation, if not building on top of ResultSet.

Secondly, it's not common at all to multiplex a single session between 
transactional and analytical workloads. So a single Spark java driver session 
is going to only be dealing with streaming itself (maybe even only single 
stream at a time?). We could add a new command ({{STREAM}}), with query and, 
say, throughput limit, or maximum # of unacknowledged rows/bytes, and just 
server-side push as much as we can without violating the limits. The stream 
would be cancellable.

Also, ideally, once we switch to the user-space page cache, these queries 
should not be polluting it.

> Implement streaming for bulk read requests
> ------------------------------------------
>
>                 Key: CASSANDRA-11521
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11521
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Local Write-Read Paths
>            Reporter: Stefania
>            Assignee: Stefania
>             Fix For: 3.x
>
>
> Allow clients to stream data from a C* host, bypassing the coordination layer 
> and eliminating the need to query individual pages one by one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to