[ 
https://issues.apache.org/jira/browse/SOLR-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421942#comment-13421942
 ] 

Hoss Man commented on SOLR-3673:
--------------------------------

Greg: for non-math folks like me, can you explain the utility of this?  ie: 
what is an example use case that this helps solve?


One thing that jumps out at me is that the usage of the Random generators seems 
completely non-deterministic -- which may _seem_ desirable in code dealing with 
random numbers, but in the case of a solr function i don't think so.  

In particular it looks like the values returned for each doc by the 
intVal/floatVal/etc... methods on the anonymous FunctionValues instance 
returned by your RandomFunction class are dependent on the order that they are 
called, and won't return consistent values if they are called multiple times 
for the same docid.  So not only will multiple (identical) requests get 
different random values for the same document, but within a single request 
asking for the value of a single document multiple times will give you 
different values -- which i believe will wreck havock on any attempts to sort 
by these functions (and could easily cause problems if they are wrapped in 
other functions that expect determinism)

does that make sense?

I think at a minimum we should probably add a "seed" argument to all of these 
functions (similar to how RandomSortField uses the field name as a seed) so 
that people can get consistent values from consistent input -- if they want it, 
if they don't they just pass in a new seed (assuming all other things about the 
request and the index are equal of course)

Even if we do that though, I'm still worried about intVal(docid) returning 
different values if it's called multiple times in a single request though ... 
it may make sense to (precompute and) cache the random values -- if not long 
term then at least in the lifespan of the FunctionValues instance.

what do you think?
                
> Random variate functions
> ------------------------
>
>                 Key: SOLR-3673
>                 URL: https://issues.apache.org/jira/browse/SOLR-3673
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.0, 5.0
>            Reporter: Greg Bowyer
>            Assignee: Greg Bowyer
>         Attachments: SOLR-3673.patch
>
>
> Hi all
> At my $DAYJOB I have been asked to build a few random variate functions that 
> return random numbers bound to a distribution.
> I think these can be added to solr.
> I have a hesitation in that the code as written uses / needs uncommons math 
> (because we want a far better RNG than java's and because I am lazy and did 
> not want to write distributions)
> uncommons math is apache license so we are good on that front
> anyone have any thoughts on this ?
> For reference the functions are:
> rgaussian(mean, stddev) -> Random value aligned to gaussian distribution
> rpoisson(mean) -> Random value aligned to poisson distribution
> rbinomial(n, prob) -> Random value aligned to binomial distribtion
> rcontinous(min ,max) -> random continuous value between min and max
> rdiscrete(min, max) -> Random discrete value between min and max
> rexponential(rate) -> Random value from the exponential distribution

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to