[
https://issues.apache.org/jira/browse/CASSANDRA-8929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14350856#comment-14350856
]
Benedict commented on CASSANDRA-8929:
-------------------------------------
So the goal is for users to do this as an acceptance phase prior to deploying
an upgrade?
We can certainly work to make it easier to produce a good profile (manually or
otherwise), and I think better example profiles that we use for testing will go
a long way towards this.
I do like the _idea_ of automatic generation, but it's not a simple task, and
it will touch quite a few integral codepaths. We need at minimum, for each
update, to sample presence, size and compressibility for each column, along
with a frequency distribution of partition key participation, and cql row
participation (i.e. for each partition key, we need to reconstruct the
distribution of updates for each row within it). Simply collecting this is
non-trivial. Constructing a profile from this data - once stress supports all
of the functionality encountered - probably isn't super challenging
conceptually, as we can calculate a best-fit distribution for the data we've
sampled. It's still a significant chunk of work though. I do wonder if we can't
instead create a tool for generating this from an analysis of sstables combined
with some user provided data, as it would be easier to build and maintain
without it being intertwined with the c* code. Possibly alongside some very
simple sampling of just the frequency of given CQL statements.
> Workload sampling
> -----------------
>
> Key: CASSANDRA-8929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8929
> Project: Cassandra
> Issue Type: New Feature
> Components: Tools
> Reporter: Jonathan Ellis
>
> Workload *recording* looks to be unworkable (CASSANDRA-6572). We could build
> something almost as useful by sampling the requests sent to a node and
> building a synthetic workload with the same characteristics using the same
> (or anonymized) schema.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)