[
https://issues.apache.org/jira/browse/CASSANDRA-11542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299326#comment-15299326
]
Russell Alexander Spitzer edited comment on CASSANDRA-11542 at 5/25/16 2:11 AM:
--------------------------------------------------------------------------------
The benchmark looks good to me. I would only suggest you increase the volume of
data in the run so that the ratio of pulling data from C* to setting up Spark
work is higher.
was (Author: rspitzer):
The benchmark looks good to me. I would only suggest you increase the volume of
data in the run so that the ratio of pulling data from C* to setting up Spark
work is lower.
> Create a benchmark to compare HDFS and Cassandra bulk read times
> ----------------------------------------------------------------
>
> Key: CASSANDRA-11542
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11542
> Project: Cassandra
> Issue Type: Sub-task
> Components: Testing
> Reporter: Stefania
> Assignee: Stefania
> Fix For: 3.x
>
> Attachments: jfr_recordings.zip, spark-load-perf-results-001.zip,
> spark-load-perf-results-002.zip, spark-load-perf-results-003.zip
>
>
> I propose creating a benchmark for comparing Cassandra and HDFS bulk reading
> performance. Simple Spark queries will be performed on data stored in HDFS or
> Cassandra, and the entire duration will be measured. An example query would
> be the max or min of a column or a count\(*\).
> This benchmark should allow determining the impact of:
> * partition size
> * number of clustering columns
> * number of value columns (cells)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)