Le jeu. 9 mai 2019 à 15:41, Alex Herbert <alex.d.herb...@gmail.com> a écrit : > > On Sat, 4 May 2019 at 23:52, Alex Herbert <alex.d.herb...@gmail.com> wrote: > > > > > > > > On 4 May 2019, at 22:34, Gilles Sadowski <gillese...@gmail.com> wrote: > > > > > > Hi. > > > > > > Le sam. 4 mai 2019 à 21:31, Alex Herbert <alex.d.herb...@gmail.com> a > > écrit : > > >> > > >> > > >> > > >>> On 4 May 2019, at 14:46, Gilles Sadowski <gillese...@gmail.com> wrote: > > >>> > > >>> Hello. > > >>> > > >>> Le ven. 3 mai 2019 à 16:57, Alex Herbert <alex.d.herb...@gmail.com > > <mailto:alex.d.herb...@gmail.com>> a écrit : > > >>>> > > >>>> Most of the samplers in the library have very small states that are > > easy > > >>>> to compute. Some have computations that are more expensive, such as > > the > > >>>> LargeMeanPoissonSampler or the DiscreteProbabilityCollectionSampler. > > >>>> > > >>>> However once the state is computed the only part of the state that > > >>>> changes is the RNG. I would like to suggest a way to copy samplers as > > >>>> something like: > > >>>> > > >>>> DiscreteSampler newInstance(UniformRandomProvider) > > >>>> > > >>>> The new instance would share all the private state of the first > > sampler > > >>>> except the RNG. This can be used for multi-threaded applications which > > >>>> require a new sampler per thread but sample from the same > > distribution. > > >>>> > > >>>> A particular case in point is the as yet not integrated > > >>>> MarsagliaTsangWangSmallMeanPoissonSampler (see RNG-91 [1]) which has a > > >>>> "large" state [2] that takes a "long" time [3] to compute but is > > >>>> effectively immutable. This could be shared across instances saving > > >>>> memory for parallel application. > > >>>> > > >>>> A copy instance would be almost zero set-up time and provide > > opportunity > > >>>> for caching of commonly used samplers. > > >>> > > >>> The goal is sharing (immutable) state so it seems that the semantics is > > >>> not "copy". > > >>> > > >>> Isn't it a "factory" that we are after? E.g. something like: > > >>> public final class CachedSamplingFactory { > > >>> private static PoissonSamplerCache poisson = new > > PoissonSamplerCache(); > > >>> > > >>> public PoissonSampler createPoissonSampler(UniformRandomProvider > > >>> rng, double mean) { > > >>> if (!poisson.isCached(mean)) { > > >>> poisson.createCache(mean); // Initialize (requires > > >>> synchronization) ... > > >>> } > > >>> return new PoissonSampler(poisson.getCache(mean), rng); // > > >>> Construct using pre-built state. > > >>> } > > >>> } > > >>> [It may be overkill, more work, and less performant…] > > >> > > >> But you need a factory for every class you want to share state for. And > > the factory actually has to look in a cache. If you operate on an instance > > then you get what you want. Another working version of the same sampler. It > > would also be thread safe without synchronisation as long as the state is > > immutable. The only mutable state is the passed in RNG. > > > > > > Agreed. It was what I meant by the last sentence. > > > > > >>> > > >>> IIUC, you suggest to add "newInstance" in the "DiscreatSampler" > > interface (?). > > >> > > >> I did think of extending DiscreteSampler with this functionality. Not > > adding to the interface as it currently is ‘functional’ as it has only one > > method. I think that should not change. Having thought about it a bit more > > I like the idea of a new functional interface. Perhaps: > > >> > > >> interface DiscreteSamplerProvider { > > >> DiscreteSampler create(UniformRandomProvider rng); > > >> } > > >> > > >> Substitute ‘Provider’ for: > > >> > > >> - Generator > > >> - Supplier (possible clash or alignment with Java 8 depending on the > > way it is done) > > >> - Factory (though the method is not static so I do not like this) > > >> - etc > > >> > > >> So this then becomes a functional interface that can be used by > > anything. However instances of a sampler would be expected to return a > > sampler matching their own functionality. > > >> > > >> Note there are some samplers not implementing an interface that also > > could benefit from this. Namely CollectionSampler and > > DiscreteProbabilityCollectionSampler. So does this need a generic interface: > > >> > > >> Sampler<T> { > > >> T sample(); > > >> } > > >> > > >> To be complimented with: > > >> > > >> SamplerProvider<T> { > > >> Sampler<T> create(UniformRandomProvider rng); > > >> } > > >> > > >> So the library would require: > > >> > > >> SamplerProvider<T> > > >> DiscreteSamplerProvider > > >> ContinuousSamplerProvider > > >> > > >> Any sampler can choose to implement being a Provider. There are some > > cases where it is mute. For example a ZigguratNormalizedGaussianSampler > > just stores the rng in the constructor. However it could still be a > > Provider just the method would only call the constructor. It would allow > > writing a generic multi-threaded application that just uses e.g. a > > DiscreteSamplerProvider to create samplers for each thread. You can then > > drop in the actual implementation you require. For example you could swap > > the type of PoissonSampler in your simulation by swapping the provider for > > the Poisson distribution. > > >> > > >> How does that sound? > > > > > > Fine to have > > > DiscreteSamplerProvider > > > ContinuousSamplerProvider > > > [Perhaps the "Supplier" suffix would be better to avoid confusion with > > > "UniformRandomProvider".] > > > > > > At first sight, I don't think that the generic interface would have > > > any actual use since, ultimately, the return value of "sample()" > > > will be either "int" or "double" (no polymorphism). > > > > > > > The generic interface is for the samplers that are typed for collections > > and currently return a sample T, or those that return arrays. It would not > > be for Integer or Double from the probability distribution samplers. Here > > are what could use it: > > > > CombinationSampler implements Sampler<int[]> > > PermutationSampler implements Sampler<int[]> > > CollectionSampler implements Sampler<T> > > DiscreteProbabilityCollectionSampler implements Sampler<T> > > > > All are in the package org.apache.commons.rng.sampling. > > > > Each could also implement SamplerSupplier<T>. > > > > The set-up cost for the CombinationSampler/PermutationSampler would not be > > much different from the constructor and no state can be shared. No real > > benefit here other than convenience. But the two CollectionSamplers could > > shared the final collection that is created and stored from the constructor > > input data. For the case of a large discrete probability collection sampler > > this could be a noticeable memory footprint as it also stores the > > cumulative distribution table. This would also save on the construction > > cost by not having to recompute it. > > > > Alex > > > > Any further thoughts on this? I think that Supplier is perhaps the wrong > term. A Java 8 Supplier has a get() functional method with no parameters. > These interfaces would require a UniformRandomProvider as the argument. > However the Java 8 Function<T, R> apply method which is applicable here is > is a poorer name. So: > > DiscreteSampler > ContinuousSampler > Sampler<T> > > and trying a few options out: > > DiscreteSamplerFactory createDiscreteSampler(UniformRandomProvider) > ContinuousSamplerFactory createContinuousSampler(UniformRandomProvider) > SamplerFactory<T> createSampler(UniformRandomProvider) > > vs. > > DiscreteSamplerFactory newDiscreteSampler(UniformRandomProvider) > ContinuousSamplerFactory newContinuousSampler(UniformRandomProvider) > SamplerFactory<T> newSampler(UniformRandomProvider) > > vs. > > DiscreteSamplerSupplier getDiscreteSampler(UniformRandomProvider) > ContinuousSamplerSupplier getContinuousSampler(UniformRandomProvider) > SamplerSupplier<T> getSampler(UniformRandomProvider) > > vs. > > DiscreteSamplerGenerator newDiscreteSampler(UniformRandomProvider) > ContinuousSamplerGenerator newContinuousSampler(UniformRandomProvider) > SamplerGenerator<T> newSampler(UniformRandomProvider) > > The 'create/new' nomenclature does convey that a new instance is expected, > so I prefer that over get. I'm undecided on which is the most appropriate > noun for the interface name.
How about making clearer that the purpose is to share state, and use the "fluent API": interface SharedStateSampler<R> { R withUniformRandomProvider(UniformRandomProvider rng); } E.g. public class CollectionSampler<T> implements SharedStateSampler<CollectionSampler<T>> { // ... public CollectionSampler<T> withUniformRandomProvider(UniformRandomProvider rng) { return /* new instance that shares the immutable state */; } } Gilles > > >> > > >> > > >> > > >>> I'm a bit wary that this would compound two different functionalities: > > >>> * data generator (method "sample"), > > >>> * generator generator (method "newInstance"). > > >>> [But I currently don't have an example where this would be a problem.] > > >>> > > >>> Regards, > > >>> Gilles > > >>> > > >>>> Alex > > >>>> > > >>>> [1] https://issues.apache.org/jira/browse/RNG-91 < > > https://issues.apache.org/jira/browse/RNG-91> > > >>>> > > >>>> [2] kB, or possibly MB, of tabulated data > > >>>> > > >>>> [3] Set-up cost for a Poisson sampler is in the order of 30 to 165 > > times > > >>>> as long as a SmallMeanPoissonSampler for a mean of 2 and 32. Note > > >>>> however that construction still takes only 1.1 and 4.5 microseconds > > for > > >>>> the "long" time. > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org