Le jeu. 9 mai 2019 à 15:41, Alex Herbert <alex.d.herb...@gmail.com> a écrit :
>
> On Sat, 4 May 2019 at 23:52, Alex Herbert <alex.d.herb...@gmail.com> wrote:
>
> >
> >
> > > On 4 May 2019, at 22:34, Gilles Sadowski <gillese...@gmail.com> wrote:
> > >
> > > Hi.
> > >
> > > Le sam. 4 mai 2019 à 21:31, Alex Herbert <alex.d.herb...@gmail.com> a
> > écrit :
> > >>
> > >>
> > >>
> > >>> On 4 May 2019, at 14:46, Gilles Sadowski <gillese...@gmail.com> wrote:
> > >>>
> > >>> Hello.
> > >>>
> > >>> Le ven. 3 mai 2019 à 16:57, Alex Herbert <alex.d.herb...@gmail.com
> > <mailto:alex.d.herb...@gmail.com>> a écrit :
> > >>>>
> > >>>> Most of the samplers in the library have very small states that are
> > easy
> > >>>> to compute. Some have computations that are more expensive, such as
> > the
> > >>>> LargeMeanPoissonSampler or the DiscreteProbabilityCollectionSampler.
> > >>>>
> > >>>> However once the state is computed the only part of the state that
> > >>>> changes is the RNG. I would like to suggest a way to copy samplers as
> > >>>> something like:
> > >>>>
> > >>>> DiscreteSampler newInstance(UniformRandomProvider)
> > >>>>
> > >>>> The new instance would share all the private state of the first
> > sampler
> > >>>> except the RNG. This can be used for multi-threaded applications which
> > >>>> require a new sampler per thread but sample from the same
> > distribution.
> > >>>>
> > >>>> A particular case in point is the as yet not integrated
> > >>>> MarsagliaTsangWangSmallMeanPoissonSampler (see RNG-91 [1]) which has a
> > >>>> "large" state [2] that takes a "long" time [3] to compute but is
> > >>>> effectively immutable. This could be shared across instances saving
> > >>>> memory for parallel application.
> > >>>>
> > >>>> A copy instance would be almost zero set-up time and provide
> > opportunity
> > >>>> for caching of commonly used samplers.
> > >>>
> > >>> The goal is sharing (immutable) state so it seems that the semantics is
> > >>> not "copy".
> > >>>
> > >>> Isn't it a "factory" that we are after?  E.g. something like:
> > >>> public final class CachedSamplingFactory {
> > >>>   private static PoissonSamplerCache poisson = new
> > PoissonSamplerCache();
> > >>>
> > >>>   public PoissonSampler createPoissonSampler(UniformRandomProvider
> > >>> rng, double mean) {
> > >>>       if (!poisson.isCached(mean)) {
> > >>>           poisson.createCache(mean); // Initialize (requires
> > >>> synchronization) ...
> > >>>       }
> > >>>       return new PoissonSampler(poisson.getCache(mean), rng); //
> > >>> Construct using pre-built state.
> > >>>   }
> > >>> }
> > >>> [It may be overkill, more work, and less performant…]
> > >>
> > >> But you need a factory for every class you want to share state for. And
> > the factory actually has to look in a cache. If you operate on an instance
> > then you get what you want. Another working version of the same sampler. It
> > would also be thread safe without synchronisation as long as the state is
> > immutable. The only mutable state is the passed in RNG.
> > >
> > > Agreed.  It was what I meant by the last sentence.
> > >
> > >>>
> > >>> IIUC, you suggest to add "newInstance" in the "DiscreatSampler"
> > interface (?).
> > >>
> > >> I did think of extending DiscreteSampler with this functionality. Not
> > adding to the interface as it currently is ‘functional’ as it has only one
> > method. I think that should not change. Having thought about it a bit more
> > I like the idea of a new functional interface. Perhaps:
> > >>
> > >> interface DiscreteSamplerProvider {
> > >>    DiscreteSampler create(UniformRandomProvider rng);
> > >> }
> > >>
> > >> Substitute ‘Provider’ for:
> > >>
> > >> - Generator
> > >> - Supplier (possible clash or alignment with Java 8 depending on the
> > way it is done)
> > >> - Factory (though the method is not static so I do not like this)
> > >> - etc
> > >>
> > >> So this then becomes a functional interface that can be used by
> > anything. However instances of a sampler would be expected to return a
> > sampler matching their own functionality.
> > >>
> > >> Note there are some samplers not implementing an interface that also
> > could benefit from this. Namely CollectionSampler and
> > DiscreteProbabilityCollectionSampler. So does this need a generic interface:
> > >>
> > >> Sampler<T> {
> > >>    T sample();
> > >> }
> > >>
> > >> To be complimented with:
> > >>
> > >> SamplerProvider<T> {
> > >>    Sampler<T> create(UniformRandomProvider rng);
> > >> }
> > >>
> > >> So the library would require:
> > >>
> > >> SamplerProvider<T>
> > >> DiscreteSamplerProvider
> > >> ContinuousSamplerProvider
> > >>
> > >> Any sampler can choose to implement being a Provider. There are some
> > cases where it is mute. For example a ZigguratNormalizedGaussianSampler
> > just stores the rng in the constructor. However it could still be a
> > Provider just the method would only call the constructor. It would allow
> > writing a generic multi-threaded application that just uses e.g. a
> > DiscreteSamplerProvider to create samplers for each thread. You can then
> > drop in the actual implementation you require. For example you could swap
> > the type of PoissonSampler in your simulation by swapping the provider for
> > the Poisson distribution.
> > >>
> > >> How does that sound?
> > >
> > > Fine to have
> > >  DiscreteSamplerProvider
> > >  ContinuousSamplerProvider
> > > [Perhaps the "Supplier" suffix would be better to avoid confusion with
> > > "UniformRandomProvider".]
> > >
> > > At first sight, I don't think that the generic interface would have
> > > any actual use since, ultimately, the return value of "sample()"
> > > will be either "int" or "double" (no polymorphism).
> > >
> >
> > The generic interface is for the samplers that are typed for collections
> > and currently return a sample T, or those that return arrays. It would not
> > be for Integer or Double from the probability distribution samplers. Here
> > are what could use it:
> >
> > CombinationSampler implements Sampler<int[]>
> > PermutationSampler implements Sampler<int[]>
> > CollectionSampler implements Sampler<T>
> > DiscreteProbabilityCollectionSampler implements Sampler<T>
> >
> > All are in the package org.apache.commons.rng.sampling.
> >
> > Each could also implement SamplerSupplier<T>.
> >
> > The set-up cost for the CombinationSampler/PermutationSampler would not be
> > much different from the constructor and no state can be shared. No real
> > benefit here other than convenience. But the two CollectionSamplers could
> > shared the final collection that is created and stored from the constructor
> > input data. For the case of a large discrete probability collection sampler
> > this could be a noticeable memory footprint as it also stores the
> > cumulative distribution table. This would also save on the construction
> > cost by not having to recompute it.
> >
> > Alex
> >
>
> Any further thoughts on this? I think that Supplier is perhaps the wrong
> term. A Java 8 Supplier has a get() functional method with no parameters.
> These interfaces would require a UniformRandomProvider as the argument.
> However the Java 8 Function<T, R> apply method which is applicable here is
> is a poorer name. So:
>
> DiscreteSampler
> ContinuousSampler
> Sampler<T>
>
> and trying a few options out:
>
> DiscreteSamplerFactory createDiscreteSampler(UniformRandomProvider)
> ContinuousSamplerFactory createContinuousSampler(UniformRandomProvider)
> SamplerFactory<T> createSampler(UniformRandomProvider)
>
> vs.
>
> DiscreteSamplerFactory newDiscreteSampler(UniformRandomProvider)
> ContinuousSamplerFactory newContinuousSampler(UniformRandomProvider)
> SamplerFactory<T> newSampler(UniformRandomProvider)
>
> vs.
>
> DiscreteSamplerSupplier getDiscreteSampler(UniformRandomProvider)
> ContinuousSamplerSupplier getContinuousSampler(UniformRandomProvider)
> SamplerSupplier<T> getSampler(UniformRandomProvider)
>
> vs.
>
> DiscreteSamplerGenerator newDiscreteSampler(UniformRandomProvider)
> ContinuousSamplerGenerator newContinuousSampler(UniformRandomProvider)
> SamplerGenerator<T> newSampler(UniformRandomProvider)
>
> The 'create/new' nomenclature does convey that a new instance is expected,
> so I prefer that over get. I'm undecided on which is the most appropriate
> noun for the interface name.

How about making clearer that the purpose is to share state, and
use the "fluent API":

interface SharedStateSampler<R> {
    R withUniformRandomProvider(UniformRandomProvider rng);
}

E.g.

public class CollectionSampler<T>
    implements SharedStateSampler<CollectionSampler<T>> {
    // ...
    public CollectionSampler<T>
withUniformRandomProvider(UniformRandomProvider rng) {
        return /* new instance that shares the immutable state */;
    }
}

Gilles

> > >>
> > >>
> > >>
> > >>> I'm a bit wary that this would compound two different functionalities:
> > >>> * data generator (method "sample"),
> > >>> * generator generator (method "newInstance").
> > >>> [But I currently don't have an example where this would be a problem.]
> > >>>
> > >>> Regards,
> > >>> Gilles
> > >>>
> > >>>> Alex
> > >>>>
> > >>>> [1] https://issues.apache.org/jira/browse/RNG-91 <
> > https://issues.apache.org/jira/browse/RNG-91>
> > >>>>
> > >>>> [2] kB, or possibly MB, of tabulated data
> > >>>>
> > >>>> [3] Set-up cost for a Poisson sampler is in the order of 30 to 165
> > times
> > >>>> as long as a SmallMeanPoissonSampler for a mean of 2 and 32. Note
> > >>>> however that construction still takes only 1.1 and 4.5 microseconds
> > for
> > >>>> the "long" time.
> > >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to