Le mercredi 27 juillet 2016 à 13:18 -0700, Daniel Carrera a écrit :
> Hello,
> 
> I was looking through the source code of trimmean() and I just
> realized that in general it does not remove data evenly from the top
> and bottom. Here is the source:
> 
> 
> """
>     trimmean(x, p)
> 
> Compute the trimmed mean of `x`, i.e. the mean after removing a
> proportion `p` of its highest- and lowest-valued elements.
> """
> function trimmean(x::RealArray, p::Real)
>     n = length(x)
>     n > 0 || error("x can not be empty.")
>     0 <= p < 1 || error("p must be non-negative and less than 1.")
>     rn = min(round(Int, n * p), n-1)
> 
>     sx = sort(x)
>     nl = rn >> 1
>     nh = (rn - nl)
>     s = 0.0
>     for i = (1+nl) : (n-nh)
>         @inbounds s += sx[i]
>     end
>     return s / (n - rn)
> end
> 
> 
> So this removes `nl` elements from the bottom and `nh` elements from
> the top. Some times these are the same number, and some times `nh` is
> one higher. This means that some times trimmean() removes values
> unevenly. This is not how I have seen the trimmed mean defined. Every
> source that I know says that the trimmed mean removes the same number
> of elements from the top and bottom. For example, Wilcox (2010) says:
> "More generally, if we round [p * n] down to the nearest integer g,
> remove the g smallest and largest values and average the n - 2g
> values that remain". This distinction is not irrelevant. There are
> theorems about how to compute the variance and confidence intervals
> for the trimmed mean that rely on one particular definition of the
> trimmed mean. If you change the definition, I can no longer compute a
> confidence interval for the computed value.
> 
> Another difference between the trimmean() function and the usual
> definition is that the "p% trimmed mean" should mean that you remove
> p% from the top and p% from the bottom. Whereas in the trimmean()
> function it means that you remove (p/2)% from the top and (p/2)% from
> the bottom.
> 
> 
> Is there any chance that the definition of trimmean() could be
> changed in a future release to agree with Wilcox (2010) and other
> texts?
I guess so, in particular if you confirm that other major software
behaves that way, and even more so if you make a PR.


Regards

-- 
You received this message because you are subscribed to the Google Groups 
"julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to