I'm working on a probability distribution library myself (ProDist) I've got an 
initial set of 1D distributions implemented and I'm working on Copulas now (A 
copula is the interdependency structure between dimensions of a multivariate 
continuous distribution).  Discrete and mixed Discrete and continuous 
dimensions will follow.  

I made distributions into polymorphic objects, and the classes can auto-best 
fit instances of themselves to data via the minimum chi squared method (maximum 
likelihood tends to over fit, maximum entropy tends to under fit, minimum chi 
squared is somewhere in the middle, although I'll also need maximum likelihood 
for when there aren't enough points to do minimum chi squared).  The point is 
that functions like mean, variance, std dev, skewness, (excess) kurtosis, 
entropy, pdf, cdf, inverse cdf, draw (including vector and matrix valued of the 
4 previous) central and non-central moments,  etc are member function on the 
object, so the corresponding call would be

double a(1.5), b(3.0);
prodist::cont::dist1d::Beta(a,b) beta;  //cont means continuous, dist1d 
distinguishes from copula and distnd directory/namespaces.
double mean(beta.mean());

which I think is more natural than the boost syntax

also having distributions be objects allows for non-parametric distributions 
like point sets, histograms, kernel density estimates (not yet implemented, 
although I have a automagically binned histogram class whose CDF in enhanced by 
a PCHIP which is comparable to ISJ KDE's in terms of accuracy while being a lot 
faster and enabling piecewise analytical evaluation of all member functions) 
and is consistent with how we need to handle multidimensional distributions.

I'm using the armadillo linear algebra library as part of the interface

Since all continuous 1d distributions provide moments of arbitrary order, that 
enables "analytical" gram-schmidt orthonormalization to get ortho normal 
polynomials of arbitrary order, which you can use an armadillo roots (eigen 
solve) on to get all roots which permits polynomial chaos/stochastic 
collocation, and you get that for free with every new continuous 1d 
distribution you implement.

Admittedly I have implemented a rather limited set of continuous1d 
distributions to date, but the non-parametric distributions take up the slack 
until I get around to doing more, FFT (wrapping FFTW3) based convolutions of 
PDFs is another TODO. 

You can wrap your favorite RNG in a subclass of prodist::cont::dist1d::U01Base, 
"all" (well not dirac delta which is completely deterministic, and null which 
is basically an error code) other distributions (including discrete and 
multidimensional) take a shared_ptr to a U01base subclass (possibly the default 
object) but for efficient parallel execution you can wrap say random123 (a 
library of counter based rng's) in U01Base subclasses and give a different 
instance to individual distributions to avoid having every thread waiting on 
the same locking function for it's turn to get a draw.   RandomStream is 
another good pseudo random number generator library you can wrap in a subclass 
of U01Base if you want to guarantee repeatability in parallel applications.

But like I hinted at above, you can call draw() on all distributions to use 
them like RNGs that produce points (1D or multiD) that obey any concrete 
distribution (including conditional, i.e.  specifying a subset of dimensions, 
draws in the multidimensional case).

The point of all that is I think (and am betting on) treating distributions as 
first class polymorphic objects with associated member functions provide 
numerous advantages over the boost way of doing things.  I know that ProDist 
will never be part of boost because it uses armadillo (and is too high level) 
but it may be advantageous to rethink the boost way of doing things.

-----Original Message-----
From: Boost-users <boost-users-boun...@lists.boost.org> On Behalf Of Warren 
Weckesser via Boost-users
Sent: Sunday, May 1, 2022 7:28 PM
To: Boost-users <boost-users@lists.boost.org>
Cc: Warren Weckesser <warren.weckes...@gmail.com>
Subject: [EXTERNAL] [Boost-users] [Math / Distributions] How do mean, variance, 
etc end up in scope?

This is probably more of a C++ question than a Boost/Math question.  I think 
the answer comes down to a (surprising?) Boost namespace convention.

In a program like this:

```
#include <boost/math/distributions/beta.hpp>

using boost::math::beta_distribution;

int main()
{
    float a = 1.5f;
    float b = 3.0f;
    beta_distribution<float> dist(a, b);

    auto m = mean(dist);
}
```

how does the name 'mean' end up in scope in 'main()'? I would have expected 
(and preferred!) something like 'using boost::math::mean;' to be required.

Warren
_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.boost.org%2Fmailman%2Flistinfo.cgi%2Fboost-users&amp;data=05%7C01%7Ckdalbey%40sandia.gov%7Cdf3c602edc7e4297aed208da2bdb10e7%7C7ccb5a20a303498cb0c129007381b574%7C1%7C0%7C637870517152026118%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=rrJ8hdCVD5wWF31e3LCcNGlmXLSFSL4LkNX3CFhc1P4%3D&amp;reserved=0
_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
https://lists.boost.org/mailman/listinfo.cgi/boost-users

Reply via email to