[
https://issues.apache.org/jira/browse/CASSANDRA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081116#comment-14081116
]
Benedict commented on CASSANDRA-7631:
-------------------------------------
Ok, in that case some various random thoughts on this:
1) I suspect you're not blocking on ABQ, but on the single thread you have
consuming from it (and having this separate thread is bad anyway). It's likely
you're getting some misattribution in your profiler due to rapid thread
sleeping/waking there.
2) We should for now complain if the whole partition isn't being inserted for
this mode
3) We should create the CF on each individual thread, and we should append them
unsorted onto a ConcurrentLinkedQueue, track the total memory used in the
buffer, and have a separate thread that sorts the partition keys and flushes
out to disk once we exceed our threshold for doing so (much like memtable
flushing)
4) We should modify the PartitionGenerator to support sorting the clustering
components it generates; this way we can reduce the sorting cost fairly
dramatically, as sorting individual components is much cheaper than sorting all
components at once
5) Ideally we would visit the partition keys in approximately sorted order, so
that we can flush a single file, as this will be most efficient for loading.
This will require a minor portion of the changes I'll be introducing soon for
more realistic workload generation, and then a custom SeedGenerator that
(externally) pre-sorts the seeds based on the partitions they generate.
> Allow Stress to write directly to SSTables
> ------------------------------------------
>
> Key: CASSANDRA-7631
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7631
> Project: Cassandra
> Issue Type: Improvement
> Components: Tools
> Reporter: Russell Alexander Spitzer
> Assignee: Russell Alexander Spitzer
>
> One common difficulty with benchmarking machines is the amount of time it
> takes to initially load data. For machines with a large amount of ram this
> becomes especially onerous because a very large amount of data needs to be
> placed on the machine before page-cache can be circumvented.
> To remedy this I suggest we add a top level flag to Cassandra-Stress which
> would cause the tool to write directly to sstables rather than actually
> performing CQL inserts. Internally this would use CQLSStable writer to write
> directly to sstables while skipping any keys which are not owned by the node
> stress is running on. The same stress command run on each node in the cluster
> would then write unique sstables only containing data which that node is
> responsible for. Following this no further network IO would be required to
> distribute data as it would all already be correctly in place.
--
This message was sent by Atlassian JIRA
(v6.2#6252)