[ https://issues.apache.org/jira/browse/CASSANDRA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077050#comment-14077050 ]
Brandon Williams commented on CASSANDRA-7631: --------------------------------------------- bq. Stress seems like a perfectly reasonable place to put this, really. It also means we know the data generated is compatible with the stress workload, which is important. I agree with your latter point, but we could still reuse the code in a separate utility. It just seems like stress has enough options as it is, and introducing an sstable writer would make a lot of them nonsensical (like consistency level, replication, etc.) I'd somewhat prefer having a clear delineation, util-wise, between going over the network and writing to disk. > Allow Stress to write directly to SSTables > ------------------------------------------ > > Key: CASSANDRA-7631 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7631 > Project: Cassandra > Issue Type: Improvement > Components: Tools > Reporter: Russell Alexander Spitzer > Assignee: Russell Alexander Spitzer > > One common difficulty with benchmarking machines is the amount of time it > takes to initially load data. For machines with a large amount of ram this > becomes especially onerous because a very large amount of data needs to be > placed on the machine before page-cache can be circumvented. > To remedy this I suggest we add a top level flag to Cassandra-Stress which > would cause the tool to write directly to sstables rather than actually > performing CQL inserts. Internally this would use CQLSStable writer to write > directly to sstables while skipping any keys which are not owned by the node > stress is running on. The same stress command run on each node in the cluster > would then write unique sstables only containing data which that node is > responsible for. Following this no further network IO would be required to > distribute data as it would all already be correctly in place. -- This message was sent by Atlassian JIRA (v6.2#6252)