On 7/12/15 10:38 AM, Thomas Neidhart wrote:
> On 07/12/2015 04:58 PM, Phil Steitz wrote:
>> On 7/12/15 2:50 AM, Thomas Neidhart wrote:
>>> On 07/11/2015 09:43 PM, Phil Steitz wrote:
>>>> On 7/11/15 12:29 PM, Thomas Neidhart wrote:
>>>>> On 07/11/2015 09:08 PM, Phil Steitz wrote:
>>>>>> The code implemented in MATH-1242 to improve performance of KS
>>>>>> monteCarloP in-lines efficient generation of random boolean arrays.
>>>>>>   Unfortunately, I think the implementation is not quite random (see
>>>>>> comments on the ticket).  To verify it, we need to be able to test
>>>>>> the random boolean array generation directly.  To do that, we have
>>>>>> to either expose the method (at least as protected) in the KS class
>>>>>> or add it somewhere else.  I propose the latter but am not sure
>>>>>> where to put it.  For speed, we need to avoid array copies, so the
>>>>>> API will have to be something like randomize(boolean[], nTrue).  It
>>>>>> could go in the swelling MathArrays class, or RandomDataGenerator. 
>>>>>> The latter probably makes more sense, but the API does not fit too
>>>>>> well.  Any ideas?
>>>>> If it is just for testing purposes, you can also access the method in
>>>>> question via reflection, see an example here:
>>>>> http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes
>>>> Do you think it *should* be a private method of the K-S class?
>>> Right now, I do not see much uses outside the class, but if we decide to
>>> make it public then I would prefer a special util class in the random
>>> package to avoid cluttering the MathArrays class.
>> OK, for now we can make it private and use the reflection method
>> above to test it.
> ok, but I guess it is also fine to make it package private as sebb
> suggested. We did something similar recently for some of the improved
> sampling methods provided by Otmar.
>
>>> Regarding the RandomDataGenerator: I think this class should be
>>> deprecated and replaced by a Sampler interface as proposed by Gilles.
>> Please consider keeping this class.  Consider this a user request. 
>> I have quite a few applications that use this class for two reasons:
> ok, the reason why I thought the class should be deprecated is because
> it was not kept up-to-date with all the new discrete and continuous
> distributions that we added in the last 2-3 years. If you think it is
> useful, then we can keep it of course.

Thanks.  I don't think it needs to include data generation methods
for all distributions.  Just the ones most commonly used.  It is a
convenience class providing common random data generation methods
using a shared RandomGenerator.  All the distributions that I need
are there.  We can add others if / when users request them / provide
patches.

Phil
>
>> 1.  One object instance tied to one PRNG that generates data from
>> multiple different distributions.   This is convenient.   Sure, I
>> could refactor all of these apps to instantiate new objects for each
>> type of generated data and hopefully still be able to peg them to
>> one PRNG; but that is needless work that also complicates the code.
>>
>> 2.  There are quite a few methods in this class that have nothing to
>> do with sampling (nextPermutation, nextHexString, nextSecureXxx,
>> etc) but which conveniently share the RandomGenerator.  I guess the
>> utility methods get moved out somewhere else.  Again, I end up
>> having to refactor all of my code that uses them and when I want
>> simulations to be based on a single PRNG, I have to find a way to
>> pass the RandomGenerator around to them.
>>
>> I don't yet see the need to refactor the sampling support in the
>> distributions package; but as my own apps are not impacted by this,
>> if everyone else sees the user impact of the refactoring as
>> outweighed by the benefit, I won't stand in the way.   Please lets
>> just keep the RandomDataGenerator convenience class in the random
>> package in any case.  I will take care of whatever adjustments are
>> needed to adapt to whatever we settle on for sampling in the
>> distributions package.
> Well, it is not really necessary to do everything together and refactor
> the distributions.
>
> Probably it is better to start the other way round, and describe what I
> want to add, and see how other things fit in:
>
>  * I want a generic Sampler interface, i.e. something like this:
>  ** nextSample()
>  ** nextSamples(int size)
>  ** nextSamples(double[] samples)
>
> there could be a DiscreteSampler and ContinuousSampler interface to
> handle the cases for int / double values.
>
> The distributions could either be changed to return such a sampler as
> Gilles proposed (with the advantage that no random instance is tied to
> the distribution itself), or implement the interface directly (with the
> advantage that we would not need to refactor too much).
>
>>> One can then create a sampler for any distribution or from other
>>> sources, e.g. when needing a fast and efficient sampler without
>>> replacement (see MATH-1239).
>> +1 for sequential sampling.  I don't follow exactly why that
>> requires refactoring the distributions; but if it helps in a way I
>> don't yet understand, that will help convince me that refactoring
>> sampling in the distributions package is worth the user pain.  
> as I said above, I wanted to combine two things in one step, maybe it is
> better to go step by step.
>
> Thomas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
> .
>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to