Re: Statistics library

BCS Thu, 23 Oct 2008 17:35:14 -0700

Reply to Andrei,

dsimcha wrote:

Since there's really no good comprehensive statistics library for D
(Tango has a little bit, the beginnings of a few are on dsource, but
nothing much), Ive been rolling my own statistics functions as
necessary.  Almost by accident, it seems like I've built up the
beginnings of a decent statistics library.  I'm debating whether it
might be interesting enough to people to be worth releasing, and
whether enough community help would be available to really make it
production quality, or to merge it with other people's efforts in
this area.  The following functionality is currently available:

Correlation (Pearson, Spearman rho, Kendall tau).   Note that the
Kendall tau correlation is a very efficient O(N log N) version.

Mean, standard deviation, variance, kurtosis, percent variance for
arrays of numeric values.

Shannon entropy, mutual information.

Kolmogorov-Smirnov tests

Binomial, hypergeometric, normal, Poisson, Kolmogorov CDFs,
hypergeometric, Poisson, binomial PDFs.

Inverse normal distribution, and normally distributed random number
generation.

A struct to generate all possible permutations of a sequence.

On the other hand, I'm a scientist, not a full-time programmer, and
although I can write working code, I have no clue what it takes to
get code up to the gold standard of "production."  Also, this library
is very D2-dependent, and I have no interest in back-porting it.  Of
course if by some chance someone else wanted to back-port it, they'd
be more than welcome.

Most of the code is covered somehow or another by unit tests,
although I cheated a lot by having some unit tests depend on multiple
functions.

Is there any interest in this from others in the D community?  Do
other people think that D would benefit from having a decent
statistics library?  Other comments?

If the community is interested, I'd be glad to take over your code and
put it in Phobos.

Andrei

Even better would be getting it in both Phobos and Tango. Shouldn't be hardas I can't think it should depend on much.

Re: Statistics library

Reply via email to