Yes, I’m pretty sure you understood correctly (I wrote most of this, but it’s 
been a long time so I cannot remember much for certain).  

It should be implemented like the Strings generator.  It looks like both 
HexStrings and HexBytes are incorrect, and have been for a long time.


> On 12 Dec 2018, at 22:27, Saleil Bhat (BLOOMBERG/ 731 LEX) 
> <sbha...@bloomberg.net> wrote:
> 
> Hi, 
> 
> I have a question about the behavior of the HexStrings value generator in the 
> cassandra-stress tool, particularly concerning its population/identity 
> distribution.  
> 
> 
> Per the discussion in JIRA item CASSANDRA-6146 concerning the stress YAML 
> profile, the population field in a columnspec “represents the total unique 
> population distribution of that column across rows.”
> 
> 
> I interpreted this to mean that if I specify some distribution 'F' for a 
> column, then the probability of occurrence for each potential value of that 
> column is given by 'F'. 
> 
> So, for example, if I provided the following columnspec for a text column: 
>  name: fake_column 
>           size: fixed(32) 
>     population: gaussian(1..100)  
> and then generated a large amount of data according to this specification, 
> I would expect there to be 100 distinct values for ‘fake_column’, and that a 
> histogram of the frequency of occurrence of each value would be roughly 
> bell-shaped. 
> 
> 
> 
> However, the current implementation of the HexStrings generator deviates from 
> this expectation. In the current implementation, each CHARACTER in the string 
> is drawn from F, rather than the string as a whole. Therefore, if you plot 
> the histogram of frequency of occurrence for each character, you get a 
> bell-shaped curve, but the distribution of the occurrences of whole strings 
> (the actual columns) is something else. 
> 
> 
> My question is, is this the desired behavior for string columns? Was my 
> expectation/interpretation incorrect? If so, can anyone give some insight as 
> to why strings are designed to behave this way and what the use case is for 
> this behavior? 
> 
> Thanks, 
> -Saleil 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Reply via email to