[ https://issues.apache.org/jira/browse/CASSANDRA-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sylvain Lebresne updated CASSANDRA-7980: ---------------------------------------- Assignee: (was: Branimir Lambov) > cassandra-stress should support partial clustering column generation > -------------------------------------------------------------------- > > Key: CASSANDRA-7980 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7980 > Project: Cassandra > Issue Type: Improvement > Components: Testing > Reporter: Benedict > Priority: Minor > > cassandra-stress generates its data randomly, in tiers, so that we can scroll > through the partitions it generates without having to generate their > entirety. The problem is that to support very large partitions (important for > benchmarking certain cases, and acceptance testing) we have to have a large > number of clustering columns - generally more than we would otherwise have, > which changes the performance characteristics. We should effectively split > each clustering column into a number of byte-ranges that become tiers for > visitation. The only real complexity here is in obeying the size/count > distribution range specified, which would be difficult for exponential > distributions, however we could require the user specify the ranges, and > distributions for each range, upfront. We could even treat them exactly like > other column specifications, but as sub-specs within a given column in the > yaml. Or, we could simply accept that we imperfectly follow the distribution > in these situations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)