Numpy gives the same results as above, but interestingly R gives the 
mathematically correct result. It appears that they use a 2-pass approach 
for computing the mean:
https://github.com/wch/r-source/blob/6338928e70891145221c781a19d2bb6b17868fb5/src/main/summary.c#L429-L437

-s

On Monday, 7 September 2015 10:47:42 UTC+1, Simon Byrne wrote:
>
> The problem is that the computation of the mean is inaccurate: although 
> the relative error is small, when you perform the subtraction for computing 
> the standard deviation, as the numbers are almost the same, the relative 
> error is massively amplified (this is typically known as *catastrophic 
> cancellation*).
>
> One thing we could do here is compute the mean using compensated summation 
> and Dekker-style division. But I'm not sure it's really worth the effort.
>
> -simon
>
> On Monday, 7 September 2015 09:03:18 UTC+1, Tim Holy wrote:
>>
>> On Sunday, September 06, 2015 11:46:05 PM Diego Javier Zea wrote: 
>> > Then, what is the best way to test if a number is equal to 0.0? 
>>
>> Not to do it. Much better is to see whether it's much smaller than 
>> something 
>> finite, e.g., sta < 100*eps(ave). ave provides the scale against which 
>> sta 
>> should be measured. 
>>
>> --Tim 
>>
>>
>> > 
>> > 2015-09-06 22:47 GMT-03:00 Tim Holy <[email protected]>: 
>> > > 0 is a special case because there is no "scale" by which to measure 
>> > > "closeness": 
>> > > 
>> > > julia> eps(1.0) 
>> > > 2.220446049250313e-16 
>> > > 
>> > > julia> eps(1e-10) 
>> > > 1.2924697071141057e-26 
>> > > 
>> > > julia> eps(1e-100) 
>> > > 1.2689709186578246e-116 
>> > > 
>> > > --Tim 
>> > > 
>> > > On Sunday, September 06, 2015 06:39:06 PM Diego Javier Zea wrote: 
>> > > > Thanks Tim! But I don't fully understand it, because I believed 
>> that* 
>> > > > isapprox()* should be *true*, but gives me *false* in this case. My 
>> > > > particular function with the problem use the *std()* output in the 
>> next 
>> > > > way. Is this correct? I'm using *isapprox* and 
>> > > > * !isapprox* 
>> > > > function calculatezscore{T,N}(value::AbstractArray{T,N}, average:: 
>> > > > AbstractArray{T,N}, sd::AbstractArray{T,N}) 
>> > > > 
>> > > >   zscore = similar(value) 
>> > > >   if size(value) == size(average) == size(sd) 
>> > > >   
>> > > >     for i in eachindex(zscore) 
>> > > >     
>> > > >       val = value[i] 
>> > > >       ave = average[i] 
>> > > >       sta = sd[i] 
>> > > >       if val ≈ ave 
>> > > >       
>> > > >         zscore[i] = 0.0 
>> > > >       
>> > > >       elseif sta ≉ 0.0 
>> > > >       
>> > > >         zscore[i] = (val - ave)/sta 
>> > > >       
>> > > >       else 
>> > > >       
>> > > >         zscore[i] = NaN 
>> > > >       
>> > > >       end 
>> > > >     
>> > > >     end 
>> > > >   
>> > > >   else 
>> > > >   
>> > > >     throw(ErrorException("The elements should have the same size")) 
>> > > >   
>> > > >   end 
>> > > >   zscore 
>> > > > 
>> > > > end 
>> > > > 
>> > > > Are there other ways to manage the reality of the float arithmetic? 
>> > > > 
>> > > > El domingo, 6 de septiembre de 2015, 22:27:23 (UTC-3), Tim Holy 
>> escribió: 
>> > > > > 1e-17 is indistinguishable from mathematical zero if it's based 
>> on 
>> > > > > differences 
>> > > > > among numbers that are O(1). 
>> > > > > 
>> > > > > julia> a = fill(total, 100); 
>> > > > > 
>> > > > > julia> sum(a)/100 == total 
>> > > > > false 
>> > > > > 
>> > > > > julia> sum(a)/100 - total 
>> > > > > -1.3877787807814457e-17 
>> > > > > 
>> > > > > But this is off by only one ulp: 
>> > > > > 
>> > > > > julia> nextfloat(sum(a)/100) == total 
>> > > > > true 
>> > > > > 
>> > > > > So my advice is that you should write your algorithms in a manner 
>> that 
>> > > > > takes 
>> > > > > the realities of floating-point arithmetic into account. 
>> > > > > 
>> > > > > --Tim 
>> > > > > 
>> > > > > On Sunday, September 06, 2015 05:19:36 PM Diego Javier Zea wrote: 
>> > > > > > The returned value from std should be zero if all the values 
>> are 
>> > > 
>> > > equal 
>> > > 
>> > > > > > (since the distance between the mean and the values should be 
>> zero). 
>> > > > > > However, the returned value isn't zero: 
>> > > > > > 
>> > > > > > julia> total 
>> > > > > > 0.11008615848501174 
>> > > > > > 
>> > > > > > julia> mean(Float64[total for i in 1:100]) 
>> > > > > > 0.11008615848501173 
>> > > > > > 
>> > > > > > julia> std(Float64[total for i in 1:100]) 
>> > > > > > 1.3947701538996772e-17 
>> > > > > > 
>> > > > > > julia> std(Float64[total for i in 1:100]) ≈ zero(Float64) 
>> > > > > > false 
>> > > > > > 
>> > > > > > julia> mean(Float64[total for i in 1:100]) ≈ total 
>> > > > > > true 
>> > > > > > 
>> > > > > > 
>> > > > > > How can I get a more precise result from *std()*? 
>> > > > > > 
>> > > > > > Best, 
>>
>>

Reply via email to