[
https://issues.apache.org/jira/browse/CASSANDRA-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235496#comment-15235496
]
Aleksey Yeschenko commented on CASSANDRA-11521:
-----------------------------------------------
[~brianmhess] Does C*-Spark integration use CL.LOCAL_ONE for reads? I know we
do use QUORUM for writes, as a method for overload control.
A small hint on top of regular {{SELECT}} is a decent first step, but there is
so much more we can do, in general, to make streaming faster, if we go for
something purpose-built instead (even if built on top of Native protocol) -
with proper support from the driver.
Among other things, the protocol is very wasteful for the cases where you
stream all the data, especially if you have big partitions and a few clustering
columns. While clustering column repetition as part of cell names is now fully
gone from sstables and in-memory representation, in the protocol itself, with
each row, we both repeat all the clustering columns - even if many rows share
them - and the partition key columns. Could get rid of it, and all related
redundant serialisation, if not building on top of ResultSet.
Secondly, it's not common at all to multiplex a single session between
transactional and analytical workloads. So a single Spark java driver session
is going to only be dealing with streaming itself (maybe even only single
stream at a time?). We could add a new command ({{STREAM}}), with query and,
say, throughput limit, or maximum # of unacknowledged rows/bytes, and just
server-side push as much as we can without violating the limits. The stream
would be cancellable.
Also, ideally, once we switch to the user-space page cache, these queries
should not be polluting it.
> Implement streaming for bulk read requests
> ------------------------------------------
>
> Key: CASSANDRA-11521
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11521
> Project: Cassandra
> Issue Type: Sub-task
> Components: Local Write-Read Paths
> Reporter: Stefania
> Assignee: Stefania
> Fix For: 3.x
>
>
> Allow clients to stream data from a C* host, bypassing the coordination layer
> and eliminating the need to query individual pages one by one.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)