[jira] [Commented] (CASSANDRA-11521) Implement streaming for bulk read requests

Stefania (JIRA) Fri, 29 Jul 2016 19:48:50 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400395#comment-15400395
 ]


Stefania commented on CASSANDRA-11521:
--------------------------------------

The patch is ready for review:

||trunk|[patch|https://github.com/stef1927/cassandra/commits/11521]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11521-testall/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11521-dtest/]|

There are also the [driver 
patch|https://github.com/stef1927/java-driver/commits/11521] and the [spark 
connector 
patch|https://github.com/stef1927/spark-cassandra-connector/commits/11521]. For 
these I plan to create tickets for the respective projects once the native 
protocol changes have been finalized.

A [design 
document|https://docs.google.com/document/d/1YqKGSU1P8EJIfMrO--29VaSoCy5mUu-ePfAiIOLsY7o/edit]
 is also available.

The Spark benchmark results are available in [this 
comment|https://issues.apache.org/jira/browse/CASSANDRA-9259?focusedCommentId=15400394&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15400394]
 on the parent ticket. The final patch is slightly better than the 
proof-of-concept, and the asynchronous paging mechanism significantly 
outperforms the existing mechanism for large data sets.

I've also repeated some cstar_perf tests to rule out performance regressions 
with ordinary queries, which are not in the optimized path:

* Single partition queries (default cassandra-stress read command) at 
CL.LOCAL_ONE (the cassandra-stress default): [first 
run|http://cstar.datastax.com/graph?command=one_job&stats=8b1f1d54-53e4-11e6-85af-0256e416528f&metric=99th_latency&operation=2_read&smoothing=1&show_aggregates=true&xmin=0&xmax=276.98&ymin=0&ymax=22.33],
 [second run with swapped revision's 
order|http://cstar.datastax.com/graph?command=one_job&stats=1abd3fe4-545e-11e6-8920-0256e416528f&metric=op_rate&operation=2_read&smoothing=1&show_aggregates=true&xmin=0&xmax=277.86&ymin=0&ymax=243951.4],
 [an old 
run|http://cstar.datastax.com/graph?command=one_job&stats=16cef080-53dc-11e6-b967-0256e416528f&metric=op_rate&operation=2_read&smoothing=1&show_aggregates=true&xmin=0&xmax=282.92&ymin=0&ymax=249571.3]
 done before enabling token aware routing in cassandra stress.

* Single partition queries at CL.ALL: [unique 
run|http://cstar.datastax.com/graph?command=one_job&stats=e2155410-5462-11e6-9cd7-0256e416528f&metric=op_rate&operation=2_read&smoothing=1&show_aggregates=true&xmin=0&xmax=277.75&ymin=0&ymax=246123.9]

There is a gap of 3.6K ops/second without token aware routing and 1K with 
CL=ALL. With token aware routing the patch is instead 1K ops / second faster. 
These differences must arise from the refactoring in select statement. They are 
very small differences, the test error seems to be around 0.5K, but I can look 
into it further if there are concerns. 

> Implement streaming for bulk read requests
> ------------------------------------------
>
>                 Key: CASSANDRA-11521
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11521
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Local Write-Read Paths
>            Reporter: Stefania
>            Assignee: Stefania
>              Labels: client-impacting, protocolv5
>             Fix For: 3.x
>
>         Attachments: final-patch-jfr-profiles-1.zip
>
>
> Allow clients to stream data from a C* host, bypassing the coordination layer 
> and eliminating the need to query individual pages one by one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11521) Implement streaming for bulk read requests

Reply via email to