On 7/12/15 10:38 AM, Thomas Neidhart wrote: > On 07/12/2015 04:58 PM, Phil Steitz wrote: >> On 7/12/15 2:50 AM, Thomas Neidhart wrote: >>> On 07/11/2015 09:43 PM, Phil Steitz wrote: >>>> On 7/11/15 12:29 PM, Thomas Neidhart wrote: >>>>> On 07/11/2015 09:08 PM, Phil Steitz wrote: >>>>>> The code implemented in MATH-1242 to improve performance of KS >>>>>> monteCarloP in-lines efficient generation of random boolean arrays. >>>>>> Unfortunately, I think the implementation is not quite random (see >>>>>> comments on the ticket). To verify it, we need to be able to test >>>>>> the random boolean array generation directly. To do that, we have >>>>>> to either expose the method (at least as protected) in the KS class >>>>>> or add it somewhere else. I propose the latter but am not sure >>>>>> where to put it. For speed, we need to avoid array copies, so the >>>>>> API will have to be something like randomize(boolean[], nTrue). It >>>>>> could go in the swelling MathArrays class, or RandomDataGenerator. >>>>>> The latter probably makes more sense, but the API does not fit too >>>>>> well. Any ideas? >>>>> If it is just for testing purposes, you can also access the method in >>>>> question via reflection, see an example here: >>>>> http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes >>>> Do you think it *should* be a private method of the K-S class? >>> Right now, I do not see much uses outside the class, but if we decide to >>> make it public then I would prefer a special util class in the random >>> package to avoid cluttering the MathArrays class. >> OK, for now we can make it private and use the reflection method >> above to test it. > ok, but I guess it is also fine to make it package private as sebb > suggested. We did something similar recently for some of the improved > sampling methods provided by Otmar. > >>> Regarding the RandomDataGenerator: I think this class should be >>> deprecated and replaced by a Sampler interface as proposed by Gilles. >> Please consider keeping this class. Consider this a user request. >> I have quite a few applications that use this class for two reasons: > ok, the reason why I thought the class should be deprecated is because > it was not kept up-to-date with all the new discrete and continuous > distributions that we added in the last 2-3 years. If you think it is > useful, then we can keep it of course.
Thanks. I don't think it needs to include data generation methods for all distributions. Just the ones most commonly used. It is a convenience class providing common random data generation methods using a shared RandomGenerator. All the distributions that I need are there. We can add others if / when users request them / provide patches. Phil > >> 1. One object instance tied to one PRNG that generates data from >> multiple different distributions. This is convenient. Sure, I >> could refactor all of these apps to instantiate new objects for each >> type of generated data and hopefully still be able to peg them to >> one PRNG; but that is needless work that also complicates the code. >> >> 2. There are quite a few methods in this class that have nothing to >> do with sampling (nextPermutation, nextHexString, nextSecureXxx, >> etc) but which conveniently share the RandomGenerator. I guess the >> utility methods get moved out somewhere else. Again, I end up >> having to refactor all of my code that uses them and when I want >> simulations to be based on a single PRNG, I have to find a way to >> pass the RandomGenerator around to them. >> >> I don't yet see the need to refactor the sampling support in the >> distributions package; but as my own apps are not impacted by this, >> if everyone else sees the user impact of the refactoring as >> outweighed by the benefit, I won't stand in the way. Please lets >> just keep the RandomDataGenerator convenience class in the random >> package in any case. I will take care of whatever adjustments are >> needed to adapt to whatever we settle on for sampling in the >> distributions package. > Well, it is not really necessary to do everything together and refactor > the distributions. > > Probably it is better to start the other way round, and describe what I > want to add, and see how other things fit in: > > * I want a generic Sampler interface, i.e. something like this: > ** nextSample() > ** nextSamples(int size) > ** nextSamples(double[] samples) > > there could be a DiscreteSampler and ContinuousSampler interface to > handle the cases for int / double values. > > The distributions could either be changed to return such a sampler as > Gilles proposed (with the advantage that no random instance is tied to > the distribution itself), or implement the interface directly (with the > advantage that we would not need to refactor too much). > >>> One can then create a sampler for any distribution or from other >>> sources, e.g. when needing a fast and efficient sampler without >>> replacement (see MATH-1239). >> +1 for sequential sampling. I don't follow exactly why that >> requires refactoring the distributions; but if it helps in a way I >> don't yet understand, that will help convince me that refactoring >> sampling in the distributions package is worth the user pain. > as I said above, I wanted to combine two things in one step, maybe it is > better to go step by step. > > Thomas > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > . > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org