[ 
https://issues.apache.org/jira/browse/STATISTICS-63?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642929#comment-17642929
 ] 

Alex Herbert commented on STATISTICS-63:
----------------------------------------

A further note on the TiesStrategy. I think that it should resolves ties such 
that if you pass the output ranks through the ranking algorithm again with the 
same strategy they should be unchanged. Currently the RANDOM strategy does not 
perform this. Ties can be assigned the same rank. A second pass through will 
detect these ties and may reorder them. This can continue until randomness 
eventually gets a resolution for the ties.

If the RANDOM strategy is changed to perform a SEQUENTIAL assignment of the 
ranks in order, then shuffles them, the result will have all ties resolved on 
the first pass, and ties would be randomly ranked among equal values.

Note that all the other tie strategies should satisfy this single pass criteria 
(it can be added to the test suite). 

> Port o.a.c.math.stat.ranking to a commons-statistics-ranking module
> -------------------------------------------------------------------
>
>                 Key: STATISTICS-63
>                 URL: https://issues.apache.org/jira/browse/STATISTICS-63
>             Project: Commons Statistics
>          Issue Type: New Feature
>          Components: ranking
>    Affects Versions: 1.0
>            Reporter: Alex Herbert
>            Priority: Major
>
> The o.a.c.math4.legacy.stat.ranking package contains:
> {noformat}
> NaNStrategy.java
> NaturalRanking.java
> RankingAlgorithm.java
> TiesStrategy.java{noformat}
> There are no dependencies on other math packages.
> The TiesStrategy enum contains a RANDOM option:
> {noformat}
> "Ties get a random integral value from among applicable ranks."{noformat}
> I would suggest this is changed to
> {noformat}
> "Ties get a randomly assigned unique value from among applicable 
> ranks."{noformat}
> This is a minor change. But it allows ties to always be distinguished, which 
> seems to be the purpose of a tie strategy. The current implementation in math 
> just picks a random number and so ties can be resolved by assigning the same 
> rank to multiple points (thus not resolving anything).
> For example:
> {noformat}
> [0, 1, 1, 1, 2]{noformat}
> Can have an output of:
> {noformat}
> [0, 1, 2, 3, 4]
> [0, 1, 1, 1, 4]
> [0, 3, 3, 3, 4]
> etc{noformat}
> The suggested change would enumerate the ranks for the ties and then shuffle 
> them. All ranks would then be unique:
> {noformat}
> [0, 1, 2, 3, 4]
> [0, 1, 3, 2, 4]
> [0, 3, 2, 1, 4]
> etc{noformat}
> A second issue with the ranking package is it brings in a dependency on 
> UniformRandomProvider. If you do not supply one then an instance is created 
> (which may not be needed).
> Given that we now support JDK 8 I suggest the default uses an instance of 
> {{{}SplittableRandom{}}}. The user can override this by supplying a source of 
> random bits as a {{{}LongSupplier{}}}. This can be used as a source of 
> randomness for UniformRandomProvider from RNG. This is a functional interface 
> and using the long bits it can create random rank indices as required. The 
> package then does not expose non-JDK interfaces in its public API.
> Currently the NaturalRanking class has 6 constructors to set combinations for 
> the three properties: TiesStrategy; NaNStragtegy; and source of randomness. 
> Current API:
> {noformat}
> public NaturalRanking()
> public NaturalRanking(TiesStrategy)
> public NaturalRanking(NaNStrategy)
> public NaturalRanking(NaNStrategy, TiesStrategy)
> public NaturalRanking(UniformRandomProvider)
> public NaturalRanking(NaNStrategy, UniformRandomProvider){noformat}
> The constructors that accept a TiesStrategy create a generator even though 
> the TiesStrategy may not require it (i.e. is not RANDOM). The generator 
> should be created on demand when ties occur in the data.
> Note: The set of constructors could be changed to a builder pattern. This 
> would add builder object creation overhead for any new strategy. It also does 
> not allow implicit setting of the TiesStrategy to RANDOM if a constructor 
> with a source of randomness is used. An initial port can maintain the current 
> 6 constructors. It can be changed before the first release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to