[
https://issues.apache.org/jira/browse/CASSANDRA-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400395#comment-15400395
]
Stefania commented on CASSANDRA-11521:
--------------------------------------
The patch is ready for review:
||trunk|[patch|https://github.com/stef1927/cassandra/commits/11521]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11521-testall/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11521-dtest/]|
There are also the [driver
patch|https://github.com/stef1927/java-driver/commits/11521] and the [spark
connector
patch|https://github.com/stef1927/spark-cassandra-connector/commits/11521]. For
these I plan to create tickets for the respective projects once the native
protocol changes have been finalized.
A [design
document|https://docs.google.com/document/d/1YqKGSU1P8EJIfMrO--29VaSoCy5mUu-ePfAiIOLsY7o/edit]
is also available.
The Spark benchmark results are available in [this
comment|https://issues.apache.org/jira/browse/CASSANDRA-9259?focusedCommentId=15400394&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15400394]
on the parent ticket. The final patch is slightly better than the
proof-of-concept, and the asynchronous paging mechanism significantly
outperforms the existing mechanism for large data sets.
I've also repeated some cstar_perf tests to rule out performance regressions
with ordinary queries, which are not in the optimized path:
* Single partition queries (default cassandra-stress read command) at
CL.LOCAL_ONE (the cassandra-stress default): [first
run|http://cstar.datastax.com/graph?command=one_job&stats=8b1f1d54-53e4-11e6-85af-0256e416528f&metric=99th_latency&operation=2_read&smoothing=1&show_aggregates=true&xmin=0&xmax=276.98&ymin=0&ymax=22.33],
[second run with swapped revision's
order|http://cstar.datastax.com/graph?command=one_job&stats=1abd3fe4-545e-11e6-8920-0256e416528f&metric=op_rate&operation=2_read&smoothing=1&show_aggregates=true&xmin=0&xmax=277.86&ymin=0&ymax=243951.4],
[an old
run|http://cstar.datastax.com/graph?command=one_job&stats=16cef080-53dc-11e6-b967-0256e416528f&metric=op_rate&operation=2_read&smoothing=1&show_aggregates=true&xmin=0&xmax=282.92&ymin=0&ymax=249571.3]
done before enabling token aware routing in cassandra stress.
* Single partition queries at CL.ALL: [unique
run|http://cstar.datastax.com/graph?command=one_job&stats=e2155410-5462-11e6-9cd7-0256e416528f&metric=op_rate&operation=2_read&smoothing=1&show_aggregates=true&xmin=0&xmax=277.75&ymin=0&ymax=246123.9]
There is a gap of 3.6K ops/second without token aware routing and 1K with
CL=ALL. With token aware routing the patch is instead 1K ops / second faster.
These differences must arise from the refactoring in select statement. They are
very small differences, the test error seems to be around 0.5K, but I can look
into it further if there are concerns.
> Implement streaming for bulk read requests
> ------------------------------------------
>
> Key: CASSANDRA-11521
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11521
> Project: Cassandra
> Issue Type: Sub-task
> Components: Local Write-Read Paths
> Reporter: Stefania
> Assignee: Stefania
> Labels: client-impacting, protocolv5
> Fix For: 3.x
>
> Attachments: final-patch-jfr-profiles-1.zip
>
>
> Allow clients to stream data from a C* host, bypassing the coordination layer
> and eliminating the need to query individual pages one by one.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)