While writing the test cases for KendallsCorrelation, I discovered an interesting behavior with SpearmansCorrelation that might be considered an inconsistency. SpearmansCorrelation.correlate() throws MathIllegalArgumentException if the array length is less than 2, but returns Double.NaN if the array contains multiple copies of a single value.
This seems inconsistent with how insufficient data is handled elsewhere in Apache Commons Math. In the User Guide for SimpleRegression it says: > When there are fewer than two observations in the model, or when there is no variation in the x values (i.e. all x values are the same) all statistics return NaN. At least two observations with different x coordinates are required to estimate a bivariate regression model. Similarly, all the UnivariateStatistics return Double.NaN when there isn't enough data. When I'm computing various statistics on multiple datasets, it seems unnecessarily cumbersome to specially handle an exception for statistic and NaNs for the others. I propose that PearsonsCorrelation and SpearmansCorrelation should return NaN if there is insufficient data, whether it be from not enough observations (< 2) or not enough unique values.