Alex Herbert created RNG-185:
--------------------------------
Summary: ArraySampler to have factory methods to sample from arrays
Key: RNG-185
URL: https://issues.apache.org/jira/browse/RNG-185
Project: Commons RNG
Issue Type: Wish
Components: sample
Affects Versions: 1.6
Reporter: Alex Herbert
The ArraySampler currently offers shuffle support for arrays, similar to the
ListSampler which shuffles a List.
It does not offer an equivalent method to sample a subset from a list. The
ListSampler API is:
{code:java}
// Sample a List of size k from the input list
public static <T> List<T> sample(UniformRandomProvider rng,
List<T> collection,
int k){code}
The subset is chosen using a permutation from the PermutationSampler. This
method is static and each invocation creates a new PermutationSampler. That
class maintains an array of indices for all elements of the list. Thus repeat
invocation must recreate this list.
An improvement would be:
* Return a Sampler<double[]>
* Allow choice between a permutation (the order of the sample does matter) or
a combination (the order of the sample does not matter)
A suggested API would be:
{code:java}
public static ObjectSampler<double[]>
permutationSampler(UniformRandomProvider rng,
double[] array,
int k)
public static ObjectSampler<double[]>
combinationSampler(UniformRandomProvider rng,
double[] array,
int k) {code}
To implement this for all array types is a lot of repeat boiler plate code, and
currently does not have a use case to merit its inclusion. Note that sampling
of this type for any array can be performed using e.g.:
{code:java}
final PermutationSampler s = new PermutationSampler(rng, array.length, k);
ObjectSampler<double[]> sampler = () -> {
final int[] indices = s.sample();
final double[] sample = new double[indices.length];
for (int i = 0; i < sample.length; i++) {
sample[i] = array[indices[i]];
}
return sample;
};{code}
Note that one advantage of a direct implementation is that the indices array
created by the PermutationSampler can be created as a subset of the input array
using the same method. This removes generation of an int[] for each sample.
This would be effectively extending the package-private method in
SubsetSamplerUtils that performs a partial shuffle of an array to all array
types:
{code:java}
static int[] partialSample(int[] domain,
int steps,
UniformRandomProvider rng,
boolean upper){code}
That method is used by both the PermutationSampler and CombinationSampler to
partially shuffle the indices. The choice to return the upper or lower half of
the part-shuffled array is an optimisation for the CombinationSampler.
This ticket is a placeholder for discussion on this type of functionality and
possible use cases.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)