[ 
https://issues.apache.org/jira/browse/CASSANDRA-8929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14350901#comment-14350901
 ] 

Jonathan Shook commented on CASSANDRA-8929:
-------------------------------------------

Responding to [~jbellis], as we posted in parallel.

Short of having sampling support on the server side, I do not see us getting 
useful samples. In all the environments that we operate in, the most reliable 
tools we have are those that are built into Cassandra directly. This feature 
would allow us to stop reinventing the wheel with users every time we need to 
understand what their workload is with respect to POCs and forward planning. 
I've personally started leaning more and more on settraceprobability for this, 
but it comes with its own caveats. To have something that is more tailored 
around sampling *just* the statements would save lots of time and energy.

This is the type of feature that, when you need it, there is no substitute. If 
we could go into a new environment and make reasonable suggestions for how to 
configure sampling up front, we would be able to simply refer back to the data 
for historic context, changes in workload patterns, changes in data rates, etc.

The short answer is, No, I don't know of an easier way, given all the 
trade-offs.




> Workload sampling
> -----------------
>
>                 Key: CASSANDRA-8929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Tools
>            Reporter: Jonathan Ellis
>
> Workload *recording* looks to be unworkable (CASSANDRA-6572).  We could build 
> something almost as useful by sampling the requests sent to a node and 
> building a synthetic workload with the same characteristics using the same 
> (or anonymized) schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to