[jira] [Commented] (STATISTICS-62) Port o.a.c.math.stat.inference to a commons-statistics-inference module

Alex Herbert (Jira) Tue, 07 Feb 2023 05:52:09 -0800


    [ 
https://issues.apache.org/jira/browse/STATISTICS-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685314#comment-17685314
 ]


Alex Herbert commented on STATISTICS-62:
----------------------------------------

Use of a generic Options class requires the following API:
{code:java}
public class Options {
    public static class Builder {
        public Builder setAlternative(AlternativeHypothesis v)
        public Builder setEqualVariances(boolean v)
        public Builder setMu(double v)
        public Builder setDegreesOfFreedomAdjustment(int v)
        public Builder setPValueMethod(PValueMethod v)
        public Builder setStrictInequality(boolean v)
        public Builder setContinuityCorrection(boolean v)
        Options build()
    }
    public static Options defaults()
    public static Builder builder()
    public Builder toBuilder()
    public AlternativeHypothesis getAlternative()
    public boolean isEqualVariances()
    public double getMu()
    public int getDegreesOfFreedomAdjustment()
    public PValueMethod getPValueMethod()
    public boolean isStrictInequality()
    public boolean isContinuityCorrection()
}{code}
However I am wary of this as each test uses different options. Providing a 
single options class decouples the options specific to the test, in some cases 
the options are only applicable to a single test, and other cases the test only 
uses 1 option:
||Test||Alternative||Equal.Var||Mu||DoF Adjust||P-value method||Strict 
Inequality||Correction||Total||
|Binomial|Y| | | | | | |*1*|
|Chi-square| | | |Y| | | |*1*|
|G| | | |Y| | | |*1*|
|KS|Y| | | |Y|Y| |*3*|
|Mann-Whitney|Y| |Y| |Y| |Y|*4*|
|T|Y|Y|Y| | | | |*3*|
|Wilcoxon|Y| |Y| |Y| |Y|*4*|
|*Total*|*5*|*1*|*3*|*2*|*3*|*1*|*2*| |

The options for the Chi-square and G test are the same (since the tests both 
perform the same type of computation on observed counts). The options for the 
Mann-Whitney and Wilcoxon test are also the same. In R these two tests are 
performed by the same method due to their similarity as they both act on ranked 
data, either paired or unpaired.

Note that in many cases the default value is unlikely to be changed. Perhaps 
the only parameters which may be changed with any frequency are: the 
alternative hypothesis; the equal variances assumption for the t-test; and the 
mu value for the single sample t-test (i.e. comparing the mean of a sample to a 
non-zero expected mean). Given this it may be simpler to remove the Builder 
pattern and provide methods to change individual options. An example for the 
T-test:
{code:java}
    public static class Options {
        public static Options defaults()
        public AlternativeHypothesis getAlternative()
        public boolean isEqualVariances()
        public double getMu()
        public Options withAlternative(AlternativeHypothesis v)
        public Options withEqualVariances(boolean v)
        public Options withMu(double v)
    }{code}
Allowing:
{code:java}
double[] x = {1, 2, 3};
double[] y = {1.1, 2.2, 3.3};
TTest.test(x, y, 
Options.defaults().withAlternative(AlternativeHypothesis.LESS_THAN))

// Mean, Variance, Size
double m = 1.24;
double v = 2.3;
long n = 42;
TTest.test(m, v, n, Options.defaults().withMu(1.23)){code}
{quote}I missed the reason for the {{test(...)}} method to be {{{}static{}}}.
{quote}
The test methods do not have any state other than options, and no state beyond 
the computed result. This is suited to either a static method with configurable 
options or a configurable test instance. There is not any difference to the 
number of methods in the API so it comes down to which you prefer:
{code:java}
TTest.test(x, y)  // calls test using the default instance (of the test Class 
or test Options)

TTest.test(x, y, Options.defaults())
TTest.test(x, y, Options.defaults().withMu(1.23))

TTest.instance().test(x, y)
TTest.instance().withMu(1.23).test(x, y)
TTest.instance().withAlternative(AlternativeHypothesis.LESS_THAN).test(x, 
y){code}
Perhaps the test instance API is more Java-like and readable. Also the 
*{{{}w{}}}{{{}ith{}}}* state change methods arguably do not require explicit 
names when the argument is an enum:
{code:java}
KolmogorovSmirnovTest.instance()
                     .with(AlternativeHypothesis.LESS_THAN)
                     .with(PValueMethod.EXACT)
                     .with(Inequality.STRICT)
                     .test(x, y){code}
The instance API could use a builder pattern to reduce object creation when 
setting multiple options:
{code:java}
KolmogorovSmirnovTest.builder()
                     .set(AlternativeHypothesis.LESS_THAN)
                     .set(PValueMethod.EXACT)
                     .set(Inequality.STRICT)
                     .build()
                     .test(x, y) {code}
This always creates 2 objects (the builder and the new built instance). So 
really is only an advantage for tests that have more than 2 options where the 
'with' pattern will create an object for each changed option. This is 
significant if more than 2 options are realistically going to change on a 
regular basis when using the test. I do not think the case. For the tests that 
have more than 2 options I think it unlikely you would wish to set more than 2 
to non default options (i.e. leave p-value method to AUTO, leave continuity 
correction enabled, leave strict inequality to false (since the classic KS 
definition where it applies is Pr(D >= d)).

So I am +1 to an instance pattern for the test classes using with methods to 
set state that can change the test behaviour.

 

> Port o.a.c.math.stat.inference to a commons-statistics-inference module
> -----------------------------------------------------------------------
>
>                 Key: STATISTICS-62
>                 URL: https://issues.apache.org/jira/browse/STATISTICS-62
>             Project: Commons Statistics
>          Issue Type: New Feature
>          Components: inference
>    Affects Versions: 1.0
>            Reporter: Alex Herbert
>            Priority: Major
>
> The o.a.c.math4.legacy.stat.inference package contains:
>  
> {noformat}
> AlternativeHypothesis.java
> BinomialTest.java
> ChiSquareTest.java
> GTest.java
> InferenceTestUtils.java
> KolmogorovSmirnovTest.java
> MannWhitneyUTest.java
> OneWayAnova.java
> TTest.java
> WilcoxonSignedRankTest.java{noformat}
> The are few dependencies on other math packages. The notable exceptions are:
>  
> 1. KolmogorovSmirnovTest which requires matrix support. This is for 
> multiplication of a square matrix to support a matrix power function. This 
> uses a double matrix and the same code is duplicated for a BigFraction 
> matrix. Such code can be ported internally to support only the required 
> functions. It can also drop the defensive copy strategy used by Commons Math 
> in matrices to allow multiply in-place where appropriate for performance 
> gains.
> 2. OneWayAnova which collates the sum, sum of squares and count using 
> SummaryStatistics. This can be done using an internal class. It is possible 
> to call the test method using already computed SummaryStatistics. The method 
> that does this using the SummaryStatistics as part of the API can be dropped, 
> or supported using an interface that returns: getSum, getSumOfSquares, getN.
> All the inference Test classes have instance methods but no state. The 
> InferenceTestUtils is a static class that holds references to a singleton for 
> each class and provides static methods to pass through the underlying 
> instances.
> I suggest changing the test classes to have only static methods and dropping 
> InferenceTestUtils.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (STATISTICS-62) Port o.a.c.math.stat.inference to a commons-statistics-inference module

Reply via email to