[
https://issues.apache.org/jira/browse/STATISTICS-63?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644303#comment-17644303
]
Alex Herbert commented on STATISTICS-63:
----------------------------------------
Note:
Originally the port dropped use of UniformRandomProvider in the API to minimise
dependencies. However this was still used internally to change a provided
LongSupplier of random bits to a random generator.
Since the only function required is a function to map positive int x to the
domain [0, x), this can be done by providing the function as an
IntUnaryOperator. The dependency on RNG can then be dropped for this module.
The randomness is supplied using a method reference:
{code:java}
NaturalRanking nr1 = new NaturalRanking(new SplittableRandom()::nextInt);
UniformRandomProvider rng = RandomSource.KISS.create();
NaturalRanking nr2 = new NaturalRanking(rng::nextInt); {code}
Changes in commit:
f4b1e8c439b9319cfa019e303074f485aca4d060
> Port o.a.c.math.stat.ranking to a commons-statistics-ranking module
> -------------------------------------------------------------------
>
> Key: STATISTICS-63
> URL: https://issues.apache.org/jira/browse/STATISTICS-63
> Project: Commons Statistics
> Issue Type: New Feature
> Components: ranking
> Affects Versions: 1.0
> Reporter: Alex Herbert
> Priority: Major
> Fix For: 1.1
>
>
> The o.a.c.math4.legacy.stat.ranking package contains:
> {noformat}
> NaNStrategy.java
> NaturalRanking.java
> RankingAlgorithm.java
> TiesStrategy.java{noformat}
> There are no dependencies on other math packages.
> The TiesStrategy enum contains a RANDOM option:
> {noformat}
> "Ties get a random integral value from among applicable ranks."{noformat}
> I would suggest this is changed to
> {noformat}
> "Ties get a randomly assigned unique value from among applicable
> ranks."{noformat}
> This is a minor change. But it allows ties to always be distinguished, which
> seems to be the purpose of a tie strategy. The current implementation in math
> just picks a random number and so ties can be resolved by assigning the same
> rank to multiple points (thus not resolving anything).
> For example:
> {noformat}
> [0, 1, 1, 1, 2]{noformat}
> Can have an output of:
> {noformat}
> [0, 1, 2, 3, 4]
> [0, 1, 1, 1, 4]
> [0, 3, 3, 3, 4]
> etc{noformat}
> The suggested change would enumerate the ranks for the ties and then shuffle
> them. All ranks would then be unique:
> {noformat}
> [0, 1, 2, 3, 4]
> [0, 1, 3, 2, 4]
> [0, 3, 2, 1, 4]
> etc{noformat}
> A second issue with the ranking package is it brings in a dependency on
> UniformRandomProvider. If you do not supply one then an instance is created
> (which may not be needed).
> Given that we now support JDK 8 I suggest the default uses an instance of
> {{{}SplittableRandom{}}}. The user can override this by supplying a source of
> random bits as a {{{}LongSupplier{}}}. This can be used as a source of
> randomness for UniformRandomProvider from RNG. This is a functional interface
> and using the long bits it can create random rank indices as required. The
> package then does not expose non-JDK interfaces in its public API.
> Currently the NaturalRanking class has 6 constructors to set combinations for
> the three properties: TiesStrategy; NaNStragtegy; and source of randomness.
> Current API:
> {noformat}
> public NaturalRanking()
> public NaturalRanking(TiesStrategy)
> public NaturalRanking(NaNStrategy)
> public NaturalRanking(NaNStrategy, TiesStrategy)
> public NaturalRanking(UniformRandomProvider)
> public NaturalRanking(NaNStrategy, UniformRandomProvider){noformat}
> The constructors that accept a TiesStrategy create a generator even though
> the TiesStrategy may not require it (i.e. is not RANDOM). The generator
> should be created on demand when ties occur in the data.
> Note: The set of constructors could be changed to a builder pattern. This
> would add builder object creation overhead for any new strategy. It also does
> not allow implicit setting of the TiesStrategy to RANDOM if a constructor
> with a source of randomness is used. An initial port can maintain the current
> 6 constructors. It can be changed before the first release.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)