[
https://issues.apache.org/jira/browse/STATISTICS-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645994#comment-17645994
]
Alex Herbert commented on STATISTICS-62:
----------------------------------------
In short, yes to refactoring. But I do not know the best way to consolidate the
package.
I have done the initial work to remove all the math dependencies. Next step is
to get the test suite working. I've made notes on all changes made.
There is a lot of inconsistency across the classes so I hope to resolve all
this. Most classes could be static. But the MannWhitneyUTest and
WilcoxonSignedRankTest can be passes a RankingAlgorithm to change their
behaviour. The KolmogorovSmirnovTest has 3 methods to compute the p-value for
the single sample test, and 4 methods for the 2 sample test. It also has a
method to resolve ties when the sample size is small and the p-value method is
exact with no explanation of why this is required. It would work without tie
resolution. If tie resolution is required it should be modified to perform this
in a single pass rather than using random jitter. I have to check the repo
commit history to see why this has been done.
In order to support all tests in their current form I think that we can keep
the instance based methods and maintain a singleton for the default settings.
Any other settings could be accessed through a builder pattern to create an
instance to perform the desired computation. The test instances then have these
methods:
{code:java}
XyzTest t = XyzTest.instance();
XyzTest t = XyzTest.builder()
.configure(...)
.build();
double statistic(...); // compute the test statistic
double test(...); // compute the test p-value
boolean test(..., double alpha); // compute a reject/accept test{code}
The idea behind the builder is to support a cleaner API where the tests are
performed using the same methods, irrespective of the method used to compute
the statistic, or p-value. It allows the classes where many options are
available such as the KolmogorovSmirnovTest to be configured beyond the
defaults:
{code:java}
double[] x;
double[] y;
// Old
public double kolmogorovSmirnovTest(double[] x, double[] y)
public double monteCarloP(double d,
int n,
int m,
boolean strict,
int iterations,
UniformRandomProvider rng);
KolmogorovSmirnovTest t = new KolmogorovSmirnovTest();
double = t.monteCarloP(t.kolmogorovSmirnovTest(x, y),
x.length,
y.length,
true,
100000,
rng);
// New
double p1 = KolmogorovSmirnovTest.instance().test(x, y);
double p2 = KolmogorovSmirnovTest.builder()
.method(KolmogorovSmirnovTest.TwoSample.MONTE_CARLO)
.strict(true)
.iterations(100000)
.randomSource(rng)
.build()
.test(x, y);{code}
Here the amount of typing has not really changed.
It does make documenting all the options more difficult as the test methods are
separated from their arguments. In the case above an enum
KolmogorovSmirnovTest.TwoSample has been used to provide computation options
for the two-sample test, then the parameters strict, iterations and
randomSource must be documented to apply to this two-sample method.
However a builder is not always suitable. See the TTest which offers:
* One-sample or two-sample
* One-sided or two-sided
* Paired or unpaired (for two-sample tests)
* Homoscedastic (equal variance assumption) or heteroscedastic (for two sample
tests)
In this case a builder is not really required. This can be done using a
singleton instance for each test type. However the paired test and assumptions
on variance only applies to two sample computation.
Thus there are issues with the builder pattern as the returned objects are not
conforming to a set interface. Some tests support one-sample and two-sample
tests. Some support input as a double[], and also using some form of summary
statistics (OneWayAnova accepts mean, variance and size of each sample). Some
tests have variations based on test type (T-test homo/heteroscedastic,
paired/unpaired). Some tests have many options to compute the p-value
(KolmogorovSmirnovTest).
I will get the test suite working and can commit an initial port with the API
mostly unchanged. Then we can discuss how to refactor and keep this refactoring
in the revision history. Or we can discuss how to refactor first and commit a
clean revision history with the new API.
> Port o.a.c.math.stat.inference to a commons-statistics-inference module
> -----------------------------------------------------------------------
>
> Key: STATISTICS-62
> URL: https://issues.apache.org/jira/browse/STATISTICS-62
> Project: Commons Statistics
> Issue Type: New Feature
> Components: inference
> Affects Versions: 1.0
> Reporter: Alex Herbert
> Priority: Major
>
> The o.a.c.math4.legacy.stat.inference package contains:
>
> {noformat}
> AlternativeHypothesis.java
> BinomialTest.java
> ChiSquareTest.java
> GTest.java
> InferenceTestUtils.java
> KolmogorovSmirnovTest.java
> MannWhitneyUTest.java
> OneWayAnova.java
> TTest.java
> WilcoxonSignedRankTest.java{noformat}
> The are few dependencies on other math packages. The notable exceptions are:
>
> 1. KolmogorovSmirnovTest which requires matrix support. This is for
> multiplication of a square matrix to support a matrix power function. This
> uses a double matrix and the same code is duplicated for a BigFraction
> matrix. Such code can be ported internally to support only the required
> functions. It can also drop the defensive copy strategy used by Commons Math
> in matrices to allow multiply in-place where appropriate for performance
> gains.
> 2. OneWayAnova which collates the sum, sum of squares and count using
> SummaryStatistics. This can be done using an internal class. It is possible
> to call the test method using already computed SummaryStatistics. The method
> that does this using the SummaryStatistics as part of the API can be dropped,
> or supported using an interface that returns: getSum, getSumOfSquares, getN.
> All the inference Test classes have instance methods but no state. The
> InferenceTestUtils is a static class that holds references to a singleton for
> each class and provides static methods to pass through the underlying
> instances.
> I suggest changing the test classes to have only static methods and dropping
> InferenceTestUtils.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)