[ https://issues.apache.org/jira/browse/CASSANDRA-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051902#comment-14051902 ]
Benedict edited comment on CASSANDRA-6146 at 7/3/14 9:09 PM: ------------------------------------------------------------- bq. You can reproduce by changing the default clustering distribution to uniform(1..1024) Well, since there are 6 clustering components, a uniform(1..1024) default distribution would yield 512^6 (=(2^9)^6 = 2^54) _average_ number of rows per partition. Not surprisingly this causes an overflow in calculations. Probably worth spotting and letting people know this is an absurdly large size if it happens, and also worth using double instead of float everywhere we calculate a probability. bq. no_warmup option doesn't work Good spot. I didn't wire it up. bq. The value component generator uses the seed of the last clustering component so it always gets the same value for all rows in a partition, since the seeds are cached. -Ah, you mean all _leaf_ rows (i.e. those sharing the second-lowest level clustering component) are the same? Well spotted, this is an off-by-1 bug, and I wasn't using a clustering>1 for the leaf. It' shouldn't be the case that they are the same for the whole partition.- Ah, nuts, the off-by-1 would cause it to always generate the same seeds. Whoops bq. I'm concerned we won't be able to explain how to use this to joe user but perhaps if we come up with better terminology it and some visual examples it will make more sense. For example the clustering distribution is used to define the possible values in a single partition? if you have a population of uniform(1..1000) and clustering of fixed(1) you only see one value per partition We may need to bikeshed the nomenclature. I don't think clustering is that tough though: it is the number of instances of that component for each instance of its parent (i.e. for C components with average N clustering, there will be N^C rows). The only complex bit IMO is the updateratio and useratio; perhaps we could relabel these to 'rowspervisit' and 'rowsperbatch' and indicate in the description that they are ratios. was (Author: benedict): bq. You can reproduce by changing the default clustering distribution to uniform(1..1024) Well, since there are 6 clustering components, a uniform(1..1024) default distribution would yield 512^6 (=(2^9)^6 = 2^54) _average_ number of rows per partition. Not surprisingly this causes an overflow in calculations. Probably worth spotting and letting people know this is an absurdly large size if it happens, and also worth using double instead of float everywhere we calculate a probability. bq. no_warmup option doesn't work Good spot. I didn't wire it up. bq. The value component generator uses the seed of the last clustering component so it always gets the same value for all rows in a partition, since the seeds are cached. Ah, you mean all _leaf_ rows (i.e. those sharing the second-lowest level clustering component) are the same? Well spotted, this is an off-by-1 bug, and I wasn't using a clustering>1 for the leaf. It' shouldn't be the case that they are the same for the whole partition. bq. I'm concerned we won't be able to explain how to use this to joe user but perhaps if we come up with better terminology it and some visual examples it will make more sense. For example the clustering distribution is used to define the possible values in a single partition? if you have a population of uniform(1..1000) and clustering of fixed(1) you only see one value per partition We may need to bikeshed the nomenclature. I don't think clustering is that tough though: it is the number of instances of that component for each instance of its parent (i.e. for C components with average N clustering, there will be N^C rows). The only complex bit IMO is the updateratio and useratio; perhaps we could relabel these to 'rowspervisit' and 'rowsperbatch' and indicate in the description that they are ratios. > CQL-native stress > ----------------- > > Key: CASSANDRA-6146 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6146 > Project: Cassandra > Issue Type: New Feature > Components: Tools > Reporter: Jonathan Ellis > Assignee: T Jake Luciani > Fix For: 2.1.1 > > Attachments: 6146-v2.txt, 6146.txt, 6164-v3.txt > > > The existing CQL "support" in stress is not worth discussing. We need to > start over, and we might as well kill two birds with one stone and move to > the native protocol while we're at it. -- This message was sent by Atlassian JIRA (v6.2#6252)