> On 4 May 2019, at 14:46, Gilles Sadowski <gillese...@gmail.com> wrote: > > Hello. > > Le ven. 3 mai 2019 à 16:57, Alex Herbert <alex.d.herb...@gmail.com > <mailto:alex.d.herb...@gmail.com>> a écrit : >> >> Most of the samplers in the library have very small states that are easy >> to compute. Some have computations that are more expensive, such as the >> LargeMeanPoissonSampler or the DiscreteProbabilityCollectionSampler. >> >> However once the state is computed the only part of the state that >> changes is the RNG. I would like to suggest a way to copy samplers as >> something like: >> >> DiscreteSampler newInstance(UniformRandomProvider) >> >> The new instance would share all the private state of the first sampler >> except the RNG. This can be used for multi-threaded applications which >> require a new sampler per thread but sample from the same distribution. >> >> A particular case in point is the as yet not integrated >> MarsagliaTsangWangSmallMeanPoissonSampler (see RNG-91 [1]) which has a >> "large" state [2] that takes a "long" time [3] to compute but is >> effectively immutable. This could be shared across instances saving >> memory for parallel application. >> >> A copy instance would be almost zero set-up time and provide opportunity >> for caching of commonly used samplers. > > The goal is sharing (immutable) state so it seems that the semantics is > not "copy". > > Isn't it a "factory" that we are after? E.g. something like: > public final class CachedSamplingFactory { > private static PoissonSamplerCache poisson = new PoissonSamplerCache(); > > public PoissonSampler createPoissonSampler(UniformRandomProvider > rng, double mean) { > if (!poisson.isCached(mean)) { > poisson.createCache(mean); // Initialize (requires > synchronization) ... > } > return new PoissonSampler(poisson.getCache(mean), rng); // > Construct using pre-built state. > } > } > [It may be overkill, more work, and less performant…]
But you need a factory for every class you want to share state for. And the factory actually has to look in a cache. If you operate on an instance then you get what you want. Another working version of the same sampler. It would also be thread safe without synchronisation as long as the state is immutable. The only mutable state is the passed in RNG. > > IIUC, you suggest to add "newInstance" in the "DiscreatSampler" interface (?). I did think of extending DiscreteSampler with this functionality. Not adding to the interface as it currently is ‘functional’ as it has only one method. I think that should not change. Having thought about it a bit more I like the idea of a new functional interface. Perhaps: interface DiscreteSamplerProvider { DiscreteSampler create(UniformRandomProvider rng); } Substitute ‘Provider’ for: - Generator - Supplier (possible clash or alignment with Java 8 depending on the way it is done) - Factory (though the method is not static so I do not like this) - etc So this then becomes a functional interface that can be used by anything. However instances of a sampler would be expected to return a sampler matching their own functionality. Note there are some samplers not implementing an interface that also could benefit from this. Namely CollectionSampler and DiscreteProbabilityCollectionSampler. So does this need a generic interface: Sampler<T> { T sample(); } To be complimented with: SamplerProvider<T> { Sampler<T> create(UniformRandomProvider rng); } So the library would require: SamplerProvider<T> DiscreteSamplerProvider ContinuousSamplerProvider Any sampler can choose to implement being a Provider. There are some cases where it is mute. For example a ZigguratNormalizedGaussianSampler just stores the rng in the constructor. However it could still be a Provider just the method would only call the constructor. It would allow writing a generic multi-threaded application that just uses e.g. a DiscreteSamplerProvider to create samplers for each thread. You can then drop in the actual implementation you require. For example you could swap the type of PoissonSampler in your simulation by swapping the provider for the Poisson distribution. How does that sound? Alex > I'm a bit wary that this would compound two different functionalities: > * data generator (method "sample"), > * generator generator (method "newInstance"). > [But I currently don't have an example where this would be a problem.] > > Regards, > Gilles > >> Alex >> >> [1] https://issues.apache.org/jira/browse/RNG-91 >> <https://issues.apache.org/jira/browse/RNG-91> >> >> [2] kB, or possibly MB, of tabulated data >> >> [3] Set-up cost for a Poisson sampler is in the order of 30 to 165 times >> as long as a SmallMeanPoissonSampler for a mean of 2 and 32. Note >> however that construction still takes only 1.1 and 4.5 microseconds for >> the "long" time. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > <mailto:dev-unsubscr...@commons.apache.org> > For additional commands, e-mail: dev-h...@commons.apache.org > <mailto:dev-h...@commons.apache.org>