On Fri, 8 Nov 2013 07:35:01 -0500, Matt Adereth wrote:
While writing the test cases for KendallsCorrelation, I discovered an
interesting behavior with SpearmansCorrelation that might be
considered an
inconsistency. SpearmansCorrelation.correlate() throws
MathIllegalArgumentException if the array length is less than 2, but
returns Double.NaN if the array contains multiple copies of a single
value.
This seems inconsistent with how insufficient data is handled
elsewhere in
Apache Commons Math.
In the User Guide for SimpleRegression it says:
When there are fewer than two observations in the model, or when
there is
no variation in the x values (i.e. all x values are the same) all
statistics return NaN. At least two observations with different x
coordinates are required to estimate a bivariate regression model.
Similarly, all the UnivariateStatistics return Double.NaN when there
isn't
enough data.
When I'm computing various statistics on multiple datasets, it seems
unnecessarily cumbersome to specially handle an exception for
statistic and
NaNs for the others. I propose that PearsonsCorrelation and
SpearmansCorrelation should return NaN if there is insufficient data,
whether it be from not enough observations (< 2) or not enough unique
values.
At first sight, I'd rather expect that an identified problem (such as
"insufficient data") would raise an appropriate exception, whereas NaN
could result from other problems.
Regards,
Gilles
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org