Weighted Statistical Estimates

Matthias Boehm Sat, 18 Feb 2017 21:21:07 -0800

Going toward to our 1.0 release, I'd like to create consistency across our
weighted statistics. Conceptually, theses weights represent frequency
counts, i.e., multiplicities of input values.


So far, our documentation does not state any restrictions on these weights
but some runtime operations require integer data (I), while others allow
arbitrary floating point data as indicated below:

* moment
* cov
* aggregate
* table
* median (I)
* quantile (I)
* interQuartileMean (I)

This can lead to unexpected errors as shown by recent issues such as
SYSTEMML-1265. Looking back to R and its packages like Hmisc or reldist, it
turns out that they all allow arbitrary weights.

So, relaxing any restrictions of integer weights seems like the right
choice. As this changes the external behavior - albeit in a generalizing
manner - we should make this change now. If you have any concerns, let me
know.

Regards,
Matthias

Weighted Statistical Estimates

Reply via email to