Re: Weighted Statistical Estimates

Mike Dusenberry Sat, 18 Feb 2017 22:49:20 -0800

+1


--

Michael W. Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

On Sat, Feb 18, 2017 at 10:04 PM, Niketan Pansare <npan...@us.ibm.com>
wrote:

> +1
>
> Thanks,
>
> Niketan
>
> > On Feb 18, 2017, at 10:01 PM, Arvind Surve <ac...@yahoo.com.INVALID>
> wrote:
> >
> > +1 ------------------    Arvind Surve     Spark Technology Center
> http://www.spark.tc/
> >
> >      From: Felix Schüler <fschue...@posteo.de>
> > To: dev@systemml.incubator.apache.org
> > Sent: Saturday, February 18, 2017 9:42 PM
> > Subject: Re: Weighted Statistical Estimates
> >
> > Sounds good!
> >
> > -Felix
> >
> >> On 18.02.2017 21:20, Matthias Boehm wrote:
> >> Going toward to our 1.0 release, I'd like to create consistency across
> our
> >> weighted statistics. Conceptually, theses weights represent frequency
> >> counts, i.e., multiplicities of input values.
> >>
> >> So far, our documentation does not state any restrictions on these
> weights
> >> but some runtime operations require integer data (I), while others allow
> >> arbitrary floating point data as indicated below:
> >>
> >> * moment
> >> * cov
> >> * aggregate
> >> * table
> >> * median (I)
> >> * quantile (I)
> >> * interQuartileMean (I)
> >>
> >> This can lead to unexpected errors as shown by recent issues such as
> >> SYSTEMML-1265. Looking back to R and its packages like Hmisc or
> reldist, it
> >> turns out that they all allow arbitrary weights.
> >>
> >> So, relaxing any restrictions of integer weights seems like the right
> >> choice. As this changes the external behavior - albeit in a generalizing
> >> manner - we should make this change now. If you have any concerns, let
> me
> >> know.
> >>
> >> Regards,
> >> Matthias
> >>
> >
> >
> >
>
>

Re: Weighted Statistical Estimates

Reply via email to