[
https://issues.apache.org/jira/browse/CASSANDRA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080429#comment-14080429
]
Russell Alexander Spitzer commented on CASSANDRA-7631:
------------------------------------------------------
Back on topic, I've been running through a series of experiments to see how
much faster (if any) running through cqlsstablewriter would be than just using
the native client.
Here are some quick numbers run on my macbook against C* also running on my
macbook (for native protocol)
{code}
NOOP = Just generate a row don't do anything with it (I know this may be
optimized out)
Native = Run using -mode native cql3
SSTable = Run passing rows to a queue which is consumed by a single thread
running CQLSSTableWriter
n=1M using the example user profile
user n=1000000 no_warmup profile=cqlstress-example.yaml ops(insert=1) -rate
threads=N -mode (sstable|native cql3)
Partitions Per Second
Threads NOOP Native SSTable
1 22765 10165 20917
2 38333 17247 38659
4 58089 26920 33956
8 72434 33507 29354
16 87837 34195 29354
{code}
So while a single SSTable writer can keep up with the generator threads it
looks like contention over the ArrayBlockingQueue puts a threshold on
performance. I'm going to look into getting a threading safe version of the
SSTableWriter tomorrow (there is at the very least contention on file naming),
hopefully we'll be able to just tie a different SSTableWriter to each generator.
If all else fails we can just have them writing to different directories then
rename the sstables when we have finished.
> Allow Stress to write directly to SSTables
> ------------------------------------------
>
> Key: CASSANDRA-7631
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7631
> Project: Cassandra
> Issue Type: Improvement
> Components: Tools
> Reporter: Russell Alexander Spitzer
> Assignee: Russell Alexander Spitzer
>
> One common difficulty with benchmarking machines is the amount of time it
> takes to initially load data. For machines with a large amount of ram this
> becomes especially onerous because a very large amount of data needs to be
> placed on the machine before page-cache can be circumvented.
> To remedy this I suggest we add a top level flag to Cassandra-Stress which
> would cause the tool to write directly to sstables rather than actually
> performing CQL inserts. Internally this would use CQLSStable writer to write
> directly to sstables while skipping any keys which are not owned by the node
> stress is running on. The same stress command run on each node in the cluster
> would then write unique sstables only containing data which that node is
> responsible for. Following this no further network IO would be required to
> distribute data as it would all already be correctly in place.
--
This message was sent by Atlassian JIRA
(v6.2#6252)