[jira] [Comment Edited] (CASSANDRA-11542) Create a benchmark to compare HDFS and Cassandra bulk read times

Russell Alexander Spitzer (JIRA) Tue, 24 May 2016 19:12:05 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-11542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299326#comment-15299326
 ]


Russell Alexander Spitzer edited comment on CASSANDRA-11542 at 5/25/16 2:11 AM:
--------------------------------------------------------------------------------

The benchmark looks good to me. I would only suggest you increase the volume of 
data in the run so that the ratio of pulling data from C* to setting up Spark 
work is higher.


was (Author: rspitzer):
The benchmark looks good to me. I would only suggest you increase the volume of 
data in the run so that the ratio of pulling data from C* to setting up Spark 
work is lower.

> Create a benchmark to compare HDFS and Cassandra bulk read times
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-11542
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11542
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Testing
>            Reporter: Stefania
>            Assignee: Stefania
>             Fix For: 3.x
>
>         Attachments: jfr_recordings.zip, spark-load-perf-results-001.zip, 
> spark-load-perf-results-002.zip, spark-load-perf-results-003.zip
>
>
> I propose creating a benchmark for comparing Cassandra and HDFS bulk reading 
> performance. Simple Spark queries will be performed on data stored in HDFS or 
> Cassandra, and the entire duration will be measured. An example query would 
> be the max or min of a column or a count\(*\).
> This benchmark should allow determining the impact of:
> * partition size
> * number of clustering columns
> * number of value columns (cells)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-11542) Create a benchmark to compare HDFS and Cassandra bulk read times

Reply via email to