Alex Herbert created STATISTICS-63:
--------------------------------------
Summary: Port o.a.c.math.stat.ranking to a
commons-statistics-ranking module
Key: STATISTICS-63
URL: https://issues.apache.org/jira/browse/STATISTICS-63
Project: Commons Statistics
Issue Type: New Feature
Components: ranking
Affects Versions: 1.0
Reporter: Alex Herbert
The o.a.c.math4.legacy.stat.ranking package contains:
{noformat}
NaNStrategy.java
NaturalRanking.java
RankingAlgorithm.java
TiesStrategy.java{noformat}
There are no dependencies on other math packages.
The TiesStrategy enum contains a RANDOM option:
{noformat}
"Ties get a random integral value from among applicable ranks."{noformat}
I would suggest this is changed to
{noformat}
"Ties get a randomly assigned unique value from among applicable
ranks."{noformat}
This is a minor change. But it allows ties to always be distinguished, which
seems to be the purpose of a tie strategy. The current implementation in math
just picks a random number and so ties can be resolved by assigning the same
rank to multiple points (thus not resolving anything).
For example:
{noformat}
[0, 1, 1, 1, 2]{noformat}
Can have an output of:
{noformat}
[0, 1, 2, 3, 4]
[0, 1, 1, 1, 4]
[0, 3, 3, 3, 4]
etc{noformat}
The suggested change would enumerate the ranks for the ties and then shuffle
them. All ranks would then be unique:
{noformat}
[0, 1, 2, 3, 4]
[0, 1, 3, 2, 4]
[0, 3, 2, 1, 4]
etc{noformat}
A second issue with the ranking package is it brings in a dependency on
UniformRandomProvider. If you do not supply one then an instance is created
(which may not be needed).
Given that we now support JDK 8 I suggest the default uses an instance of
{{{}SplittableRandom{}}}. The user can override this by supplying a source of
random bits as a {{{}LongSupplier{}}}. This can be used as a source of
randomness for UniformRandomProvider from RNG. This is a functional interface
and using the long bits it can create random rank indices as required. The
package then does not expose non-JDK interfaces in its public API.
Currently the NaturalRanking class has 6 constructors to set combinations for
the three properties: TiesStrategy; NaNStragtegy; and source of randomness.
Current API:
{noformat}
public NaturalRanking()
public NaturalRanking(TiesStrategy)
public NaturalRanking(NaNStrategy)
public NaturalRanking(NaNStrategy, TiesStrategy)
public NaturalRanking(UniformRandomProvider)
public NaturalRanking(NaNStrategy, UniformRandomProvider){noformat}
The constructors that accept a TiesStrategy create a generator even though the
TiesStrategy may not require it (i.e. is not RANDOM). The generator should be
created on demand when ties occur in the data.
Note: The set of constructors could be changed to a builder pattern. This would
add builder object creation overhead for any new strategy. It also does not
allow implicit setting of the TiesStrategy to RANDOM if a constructor with a
source of randomness is used. An initial port can maintain the current 6
constructors. It can be changed before the first release.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)