[ 
https://issues.apache.org/jira/browse/CASSANDRA-11542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15263751#comment-15263751
 ] 

Sylvain Lebresne commented on CASSANDRA-11542:
----------------------------------------------

bq. I still think we need CASSANDRA-11520 and CASSANDRA-11521, but I just want 
to make sure we tackle the bigger "bang for the buck" first.

Just to say that I generally agree with that statement, but adding that 60% 
improvement (or even 30% for RDD) is not too shabby either and both 
CASSANDRA-11520 and CASSANDRA-11521 seems to be doable without too much 
effort/disruption. That is, as far as I'm concerned, you've gathered enough 
evidence to show that they are worth doing and I'd be happy to start there. 
This not excluding to continue digging in parallel of course. Anyway, great 
work so far, thanks.

> Create a benchmark to compare HDFS and Cassandra bulk read times
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-11542
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11542
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Testing
>            Reporter: Stefania
>            Assignee: Stefania
>             Fix For: 3.x
>
>         Attachments: spark-load-perf-results-001.zip, 
> spark-load-perf-results-002.zip
>
>
> I propose creating a benchmark for comparing Cassandra and HDFS bulk reading 
> performance. Simple Spark queries will be performed on data stored in HDFS or 
> Cassandra, and the entire duration will be measured. An example query would 
> be the max or min of a column or a count\(*\).
> This benchmark should allow determining the impact of:
> * partition size
> * number of clustering columns
> * number of value columns (cells)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to