Okay so I have completely hand coded the cov/var calculation to avoid any 
overhead, and now I get around the 20x speedup you mentioned. I really 
don't see any significant benefit of returning a tuple, allocating the 2 
element array has an insignificant overhead even for flot64 vectors of size 
100000 inside hot loops. I really don't like the ugly output and lack 
definitions of arithmatic that returning the tuple forces for no clear 
benefit in my mind. If this is really wanted I would much rather have a 
linreg! method, so that the default behavior is not effected by this 
optimization. I will of course change this if the will of the people is to 
have a tuple, but I am -1.

Another quick question do I make a new issue/pull request for this 
behavior? As it is no longer the trivial change of my current pull request. 
Also should this function be moved out of linalg.jl since it is really not 
using the linear algebra routines anymore?

The code I have come up with is:

function linreg5{S <: Number, T <: Number}(x::AbstractArray{S}, 
y::AbstractArray{T})
    n = length(x)
    @assert n == length(y)
    mx = mean(x) # need to know this at each step so calculate before
    tot_dx = 0.0
    tot_y = 0.0
    tot_xy = 0.0
    for i in 1:n
        tot_dx += (x[i] - mx)^2
        tot_y += y[i]
        tot_xy += x[i]*y[i]
    end
    b1 = (tot_xy - mx*tot_y)/tot_dx # a 1\n cancels in the top and bottom
    b0 = tot_y/n - b1*mx
    return [b0, b1]
end

If anyone has any comments on what could be made better.

Thanks,
Gabriel

On Monday, May 23, 2016 at 5:58:37 AM UTC-7, Andreas Noack wrote:
>
> Almost, but return a tuple instead of an array to avoid array allocation. 
> Tight now, cov allocates temporary arrays for the demeaned vectors but that 
> should probably be changed later anyway.
>
> On Monday, May 23, 2016 at 2:50:34 AM UTC-4, Gabriel Gellner wrote:
>>
>> So I am not able to get such a dramatic speed up. Do you mean something 
>> beyond:
>>
>> function linreg2{S <: Number, T <: Number}(x::AbstractArray{S}, 
>> y::AbstractArray{T})
>>     mx = mean(x)
>>     my = mean(y)
>>     b1 = Base.covm(x, mx, y, my)/varm(x, mx)
>>     b0 = my - b1*mx
>>     return [b0, b1]
>> end
>>
>> Which I find speeds up around 3x, or do you mean writing a custom cov 
>> function that is smarter about memory? (I am returning an array as I like 
>> to be able to do vector math on the coefficients ... but if I return a 
>> tuple it isn't much faster for me)
>>
>> On Sunday, May 22, 2016 at 8:37:43 PM UTC-7, Andreas Noack wrote:
>>>
>>> I don't think that linreg has received much attention over the years. 
>>> Most often it is almost as simple to use \. It you take a look at 
>>> linreg then I'd suggest that you try to write in terms of cov and var and 
>>> return a tuple instead of an Array. That will speed up the computation 
>>> already now and with an unallocating cov, I see 20x speed up over linreg 
>>> for n=10000 and Float64 element on 0.4.
>>>
>>> On Saturday, May 21, 2016 at 2:33:20 PM UTC-4, Gabriel Gellner wrote:
>>>>
>>>>
>>>>
>>>> I think it's an error.  The method definition should probably just be:
>>>>>
>>>>> linreg{T<:Number,S<:Number}(X::AbstractVector{T}, 
>>>>> y::AbstractVector{S}) = [ones(T, size(X,1)) X] \ y
>>>>>
>>>>> which will allow it to work for X and y of different types.
>>>>>
>>>>> Is using the more specific typing (<: Number) the current best 
>>>> practices? I notice a lot of the methods in linalg/generics.jl just check 
>>>> for AbstractVectors etc without requiring numeric types even though that 
>>>> would be the more correct check.
>>>>  
>>>>
>>>>> A PR (pull request) with a patch and a test case would be most welcome.
>>>>>
>>>> On it! Taking me a bit of sweat to figure out the build process and 
>>>> github stuff but once I have that all sorted a PR will be submitted. 
>>>> Thanks 
>>>> for the quick response. 
>>>>
>>>

Reply via email to