[ 
https://issues.apache.org/jira/browse/CASSANDRA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077050#comment-14077050
 ] 

Brandon Williams commented on CASSANDRA-7631:
---------------------------------------------

bq. Stress seems like a perfectly reasonable place to put this, really. It also 
means we know the data generated is compatible with the stress workload, which 
is important.

I agree with your latter point, but we could still reuse the code in a separate 
utility.  It just seems like stress has enough options as it is, and 
introducing an sstable writer would make a lot of them nonsensical (like 
consistency level, replication, etc.)  I'd somewhat prefer having a clear 
delineation, util-wise, between going over the network and writing to disk.

> Allow Stress to write directly to SSTables
> ------------------------------------------
>
>                 Key: CASSANDRA-7631
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7631
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Russell Alexander Spitzer
>            Assignee: Russell Alexander Spitzer
>
> One common difficulty with benchmarking machines is the amount of time it 
> takes to initially load data. For machines with a large amount of ram this 
> becomes especially onerous because a very large amount of data needs to be 
> placed on the machine before page-cache can be circumvented. 
> To remedy this I suggest we add a top level flag to Cassandra-Stress which 
> would cause the tool to write directly to sstables rather than actually 
> performing CQL inserts. Internally this would use CQLSStable writer to write 
> directly to sstables while skipping any keys which are not owned by the node 
> stress is running on. The same stress command run on each node in the cluster 
> would then write unique sstables only containing data which that node is 
> responsible for. Following this no further network IO would be required to 
> distribute data as it would all already be correctly in place.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to