I haven't timed it but I think the final array allocation could make a difference if you make a lot of regressions of moderate size. I don't share your view that the output is ugly. Since the function only allows for simple regression anyway, I think it is natural to save the two parameters in separate variables, e.g.
α, β = linreg(x,y) In many cases, you are only interested in β. However, you are right that this is actually a separate issue from the one you raised initially so let's just fix that for now and discuss optimizations in a later PR. On Mon, May 23, 2016 at 11:37 AM, Gabriel Gellner <[email protected]> wrote: > Okay so I have completely hand coded the cov/var calculation to avoid any > overhead, and now I get around the 20x speedup you mentioned. I really > don't see any significant benefit of returning a tuple, allocating the 2 > element array has an insignificant overhead even for flot64 vectors of size > 100000 inside hot loops. I really don't like the ugly output and lack > definitions of arithmatic that returning the tuple forces for no clear > benefit in my mind. If this is really wanted I would much rather have a > linreg! method, so that the default behavior is not effected by this > optimization. I will of course change this if the will of the people is to > have a tuple, but I am -1. > > Another quick question do I make a new issue/pull request for this > behavior? As it is no longer the trivial change of my current pull request. > Also should this function be moved out of linalg.jl since it is really not > using the linear algebra routines anymore? > > The code I have come up with is: > > function linreg5{S <: Number, T <: Number}(x::AbstractArray{S}, > y::AbstractArray{T}) > n = length(x) > @assert n == length(y) > mx = mean(x) # need to know this at each step so calculate before > tot_dx = 0.0 > tot_y = 0.0 > tot_xy = 0.0 > for i in 1:n > tot_dx += (x[i] - mx)^2 > tot_y += y[i] > tot_xy += x[i]*y[i] > end > b1 = (tot_xy - mx*tot_y)/tot_dx # a 1\n cancels in the top and bottom > b0 = tot_y/n - b1*mx > return [b0, b1] > end > > If anyone has any comments on what could be made better. > > Thanks, > Gabriel > > On Monday, May 23, 2016 at 5:58:37 AM UTC-7, Andreas Noack wrote: >> >> Almost, but return a tuple instead of an array to avoid array allocation. >> Tight now, cov allocates temporary arrays for the demeaned vectors but that >> should probably be changed later anyway. >> >> On Monday, May 23, 2016 at 2:50:34 AM UTC-4, Gabriel Gellner wrote: >>> >>> So I am not able to get such a dramatic speed up. Do you mean something >>> beyond: >>> >>> function linreg2{S <: Number, T <: Number}(x::AbstractArray{S}, >>> y::AbstractArray{T}) >>> mx = mean(x) >>> my = mean(y) >>> b1 = Base.covm(x, mx, y, my)/varm(x, mx) >>> b0 = my - b1*mx >>> return [b0, b1] >>> end >>> >>> Which I find speeds up around 3x, or do you mean writing a custom cov >>> function that is smarter about memory? (I am returning an array as I like >>> to be able to do vector math on the coefficients ... but if I return a >>> tuple it isn't much faster for me) >>> >>> On Sunday, May 22, 2016 at 8:37:43 PM UTC-7, Andreas Noack wrote: >>>> >>>> I don't think that linreg has received much attention over the years. >>>> Most often it is almost as simple to use \. It you take a look at >>>> linreg then I'd suggest that you try to write in terms of cov and var and >>>> return a tuple instead of an Array. That will speed up the computation >>>> already now and with an unallocating cov, I see 20x speed up over linreg >>>> for n=10000 and Float64 element on 0.4. >>>> >>>> On Saturday, May 21, 2016 at 2:33:20 PM UTC-4, Gabriel Gellner wrote: >>>>> >>>>> >>>>> >>>>> I think it's an error. The method definition should probably just be: >>>>>> >>>>>> linreg{T<:Number,S<:Number}(X::AbstractVector{T}, >>>>>> y::AbstractVector{S}) = [ones(T, size(X,1)) X] \ y >>>>>> >>>>>> which will allow it to work for X and y of different types. >>>>>> >>>>>> Is using the more specific typing (<: Number) the current best >>>>> practices? I notice a lot of the methods in linalg/generics.jl just check >>>>> for AbstractVectors etc without requiring numeric types even though that >>>>> would be the more correct check. >>>>> >>>>> >>>>>> A PR (pull request) with a patch and a test case would be most >>>>>> welcome. >>>>>> >>>>> On it! Taking me a bit of sweat to figure out the build process and >>>>> github stuff but once I have that all sorted a PR will be submitted. >>>>> Thanks >>>>> for the quick response. >>>>> >>>>
