On 09/05/2019 15:39, Gilles Sadowski wrote:
Le jeu. 9 mai 2019 à 15:41, Alex Herbert <alex.d.herb...@gmail.com> a écrit :
On Sat, 4 May 2019 at 23:52, Alex Herbert <alex.d.herb...@gmail.com> wrote:


On 4 May 2019, at 22:34, Gilles Sadowski <gillese...@gmail.com> wrote:

Hi.

Le sam. 4 mai 2019 à 21:31, Alex Herbert <alex.d.herb...@gmail.com> a
écrit :


On 4 May 2019, at 14:46, Gilles Sadowski <gillese...@gmail.com> wrote:

Hello.

Le ven. 3 mai 2019 à 16:57, Alex Herbert <alex.d.herb...@gmail.com
<mailto:alex.d.herb...@gmail.com>> a écrit :
Most of the samplers in the library have very small states that are
easy
to compute. Some have computations that are more expensive, such as
the
LargeMeanPoissonSampler or the DiscreteProbabilityCollectionSampler.

However once the state is computed the only part of the state that
changes is the RNG. I would like to suggest a way to copy samplers as
something like:

DiscreteSampler newInstance(UniformRandomProvider)

The new instance would share all the private state of the first
sampler
except the RNG. This can be used for multi-threaded applications which
require a new sampler per thread but sample from the same
distribution.
A particular case in point is the as yet not integrated
MarsagliaTsangWangSmallMeanPoissonSampler (see RNG-91 [1]) which has a
"large" state [2] that takes a "long" time [3] to compute but is
effectively immutable. This could be shared across instances saving
memory for parallel application.

A copy instance would be almost zero set-up time and provide
opportunity
for caching of commonly used samplers.
The goal is sharing (immutable) state so it seems that the semantics is
not "copy".

Isn't it a "factory" that we are after?  E.g. something like:
public final class CachedSamplingFactory {
   private static PoissonSamplerCache poisson = new
PoissonSamplerCache();
   public PoissonSampler createPoissonSampler(UniformRandomProvider
rng, double mean) {
       if (!poisson.isCached(mean)) {
           poisson.createCache(mean); // Initialize (requires
synchronization) ...
       }
       return new PoissonSampler(poisson.getCache(mean), rng); //
Construct using pre-built state.
   }
}
[It may be overkill, more work, and less performant…]
But you need a factory for every class you want to share state for. And
the factory actually has to look in a cache. If you operate on an instance
then you get what you want. Another working version of the same sampler. It
would also be thread safe without synchronisation as long as the state is
immutable. The only mutable state is the passed in RNG.
Agreed.  It was what I meant by the last sentence.

IIUC, you suggest to add "newInstance" in the "DiscreatSampler"
interface (?).
I did think of extending DiscreteSampler with this functionality. Not
adding to the interface as it currently is ‘functional’ as it has only one
method. I think that should not change. Having thought about it a bit more
I like the idea of a new functional interface. Perhaps:
interface DiscreteSamplerProvider {
    DiscreteSampler create(UniformRandomProvider rng);
}

Substitute ‘Provider’ for:

- Generator
- Supplier (possible clash or alignment with Java 8 depending on the
way it is done)
- Factory (though the method is not static so I do not like this)
- etc

So this then becomes a functional interface that can be used by
anything. However instances of a sampler would be expected to return a
sampler matching their own functionality.
Note there are some samplers not implementing an interface that also
could benefit from this. Namely CollectionSampler and
DiscreteProbabilityCollectionSampler. So does this need a generic interface:
Sampler<T> {
    T sample();
}

To be complimented with:

SamplerProvider<T> {
    Sampler<T> create(UniformRandomProvider rng);
}

So the library would require:

SamplerProvider<T>
DiscreteSamplerProvider
ContinuousSamplerProvider

Any sampler can choose to implement being a Provider. There are some
cases where it is mute. For example a ZigguratNormalizedGaussianSampler
just stores the rng in the constructor. However it could still be a
Provider just the method would only call the constructor. It would allow
writing a generic multi-threaded application that just uses e.g. a
DiscreteSamplerProvider to create samplers for each thread. You can then
drop in the actual implementation you require. For example you could swap
the type of PoissonSampler in your simulation by swapping the provider for
the Poisson distribution.
How does that sound?
Fine to have
  DiscreteSamplerProvider
  ContinuousSamplerProvider
[Perhaps the "Supplier" suffix would be better to avoid confusion with
"UniformRandomProvider".]

At first sight, I don't think that the generic interface would have
any actual use since, ultimately, the return value of "sample()"
will be either "int" or "double" (no polymorphism).

The generic interface is for the samplers that are typed for collections
and currently return a sample T, or those that return arrays. It would not
be for Integer or Double from the probability distribution samplers. Here
are what could use it:

CombinationSampler implements Sampler<int[]>
PermutationSampler implements Sampler<int[]>
CollectionSampler implements Sampler<T>
DiscreteProbabilityCollectionSampler implements Sampler<T>

All are in the package org.apache.commons.rng.sampling.

Each could also implement SamplerSupplier<T>.

The set-up cost for the CombinationSampler/PermutationSampler would not be
much different from the constructor and no state can be shared. No real
benefit here other than convenience. But the two CollectionSamplers could
shared the final collection that is created and stored from the constructor
input data. For the case of a large discrete probability collection sampler
this could be a noticeable memory footprint as it also stores the
cumulative distribution table. This would also save on the construction
cost by not having to recompute it.

Alex

Any further thoughts on this? I think that Supplier is perhaps the wrong
term. A Java 8 Supplier has a get() functional method with no parameters.
These interfaces would require a UniformRandomProvider as the argument.
However the Java 8 Function<T, R> apply method which is applicable here is
is a poorer name. So:

DiscreteSampler
ContinuousSampler
Sampler<T>

and trying a few options out:

DiscreteSamplerFactory createDiscreteSampler(UniformRandomProvider)
ContinuousSamplerFactory createContinuousSampler(UniformRandomProvider)
SamplerFactory<T> createSampler(UniformRandomProvider)

vs.

DiscreteSamplerFactory newDiscreteSampler(UniformRandomProvider)
ContinuousSamplerFactory newContinuousSampler(UniformRandomProvider)
SamplerFactory<T> newSampler(UniformRandomProvider)

vs.

DiscreteSamplerSupplier getDiscreteSampler(UniformRandomProvider)
ContinuousSamplerSupplier getContinuousSampler(UniformRandomProvider)
SamplerSupplier<T> getSampler(UniformRandomProvider)

vs.

DiscreteSamplerGenerator newDiscreteSampler(UniformRandomProvider)
ContinuousSamplerGenerator newContinuousSampler(UniformRandomProvider)
SamplerGenerator<T> newSampler(UniformRandomProvider)

The 'create/new' nomenclature does convey that a new instance is expected,
so I prefer that over get. I'm undecided on which is the most appropriate
noun for the interface name.
How about making clearer that the purpose is to share state, and
use the "fluent API":

interface SharedStateSampler<R> {
     R withUniformRandomProvider(UniformRandomProvider rng);
}

E.g.

public class CollectionSampler<T>
     implements SharedStateSampler<CollectionSampler<T>> {
     // ...
     public CollectionSampler<T>
withUniformRandomProvider(UniformRandomProvider rng) {
         return /* new instance that shares the immutable state */;
     }
}

Gilles

Well that is much nicer. I am fine with that.

I note that this idea can be applied to any sampler even with a very small state. Should we aim for that or only pick the low hanging fruit of those samplers that have a relatively large construction cost or internal state?

I would favour doing it for all samplers that have a state just to be consistent. It just needs a bit more work to put into the library.





I'm a bit wary that this would compound two different functionalities:
* data generator (method "sample"),
* generator generator (method "newInstance").
[But I currently don't have an example where this would be a problem.]

Regards,
Gilles

Alex

[1] https://issues.apache.org/jira/browse/RNG-91 <
https://issues.apache.org/jira/browse/RNG-91>
[2] kB, or possibly MB, of tabulated data

[3] Set-up cost for a Poisson sampler is in the order of 30 to 165
times
as long as a SmallMeanPoissonSampler for a mean of 2 and 32. Note
however that construction still takes only 1.1 and 4.5 microseconds
for
the "long" time.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to