Since there's really no good comprehensive statistics library for D
(Tango has a little bit, the beginnings of a few are on dsource, but
nothing much), Ive been rolling my own statistics functions as
necessary. Almost by accident, it seems like I've built up the
beginnings of a decent statistics library. I'm debating whether it
might be interesting enough to people to be worth releasing, and
whether enough community help would be available to really make it
production quality, or to merge it with other people's efforts in
this area. The following functionality is currently available:
Correlation (Pearson, Spearman rho, Kendall tau). Note that the
Kendall tau correlation is a very efficient O(N log N) version.
Mean, standard deviation, variance, kurtosis, percent variance for
arrays of numeric values.
Shannon entropy, mutual information.
Kolmogorov-Smirnov tests
Binomial, hypergeometric, normal, Poisson, Kolmogorov CDFs,
hypergeometric, Poisson, binomial PDFs.
Inverse normal distribution, and normally distributed random number
generation.
A struct to generate all possible permutations of a sequence.
On the other hand, I'm a scientist, not a full-time programmer, and
although I can write working code, I have no clue what it takes to
get code up to the gold standard of "production." Also, this library
is very D2-dependent, and I have no interest in back-porting it. Of
course if by some chance someone else wanted to back-port it, they'd
be more than welcome.
Most of the code is covered somehow or another by unit tests,
although I cheated a lot by having some unit tests depend on multiple
functions.
Is there any interest in this from others in the D community? Do
other people think that D would benefit from having a decent
statistics library? Other comments?