Re: [julia-users] Re: linreg strangely picky about inputs

Andreas Noack Mon, 23 May 2016 08:46:42 -0700

I haven't timed it but I think the final array allocation could make a
difference if you make a lot of regressions of moderate size. I don't share
your view that the output is ugly. Since the function only allows for
simple regression anyway, I think it is natural to save the two parameters
in separate variables, e.g.


α, β = linreg(x,y)

In many cases, you are only interested in β. However, you are right that
this is actually a separate issue from the one you raised initially so
let's just fix that for now and discuss optimizations in a later PR.

On Mon, May 23, 2016 at 11:37 AM, Gabriel Gellner <[email protected]>
wrote:

> Okay so I have completely hand coded the cov/var calculation to avoid any
> overhead, and now I get around the 20x speedup you mentioned. I really
> don't see any significant benefit of returning a tuple, allocating the 2
> element array has an insignificant overhead even for flot64 vectors of size
> 100000 inside hot loops. I really don't like the ugly output and lack
> definitions of arithmatic that returning the tuple forces for no clear
> benefit in my mind. If this is really wanted I would much rather have a
> linreg! method, so that the default behavior is not effected by this
> optimization. I will of course change this if the will of the people is to
> have a tuple, but I am -1.
>
> Another quick question do I make a new issue/pull request for this
> behavior? As it is no longer the trivial change of my current pull request.
> Also should this function be moved out of linalg.jl since it is really not
> using the linear algebra routines anymore?
>
> The code I have come up with is:
>
> function linreg5{S <: Number, T <: Number}(x::AbstractArray{S},
> y::AbstractArray{T})
>     n = length(x)
>     @assert n == length(y)
>     mx = mean(x) # need to know this at each step so calculate before
>     tot_dx = 0.0
>     tot_y = 0.0
>     tot_xy = 0.0
>     for i in 1:n
>         tot_dx += (x[i] - mx)^2
>         tot_y += y[i]
>         tot_xy += x[i]*y[i]
>     end
>     b1 = (tot_xy - mx*tot_y)/tot_dx # a 1\n cancels in the top and bottom
>     b0 = tot_y/n - b1*mx
>     return [b0, b1]
> end
>
> If anyone has any comments on what could be made better.
>
> Thanks,
> Gabriel
>
> On Monday, May 23, 2016 at 5:58:37 AM UTC-7, Andreas Noack wrote:
>>
>> Almost, but return a tuple instead of an array to avoid array allocation.
>> Tight now, cov allocates temporary arrays for the demeaned vectors but that
>> should probably be changed later anyway.
>>
>> On Monday, May 23, 2016 at 2:50:34 AM UTC-4, Gabriel Gellner wrote:
>>>
>>> So I am not able to get such a dramatic speed up. Do you mean something
>>> beyond:
>>>
>>> function linreg2{S <: Number, T <: Number}(x::AbstractArray{S},
>>> y::AbstractArray{T})
>>>     mx = mean(x)
>>>     my = mean(y)
>>>     b1 = Base.covm(x, mx, y, my)/varm(x, mx)
>>>     b0 = my - b1*mx
>>>     return [b0, b1]
>>> end
>>>
>>> Which I find speeds up around 3x, or do you mean writing a custom cov
>>> function that is smarter about memory? (I am returning an array as I like
>>> to be able to do vector math on the coefficients ... but if I return a
>>> tuple it isn't much faster for me)
>>>
>>> On Sunday, May 22, 2016 at 8:37:43 PM UTC-7, Andreas Noack wrote:
>>>>
>>>> I don't think that linreg has received much attention over the years.
>>>> Most often it is almost as simple to use \. It you take a look at
>>>> linreg then I'd suggest that you try to write in terms of cov and var and
>>>> return a tuple instead of an Array. That will speed up the computation
>>>> already now and with an unallocating cov, I see 20x speed up over linreg
>>>> for n=10000 and Float64 element on 0.4.
>>>>
>>>> On Saturday, May 21, 2016 at 2:33:20 PM UTC-4, Gabriel Gellner wrote:
>>>>>
>>>>>
>>>>>
>>>>> I think it's an error.  The method definition should probably just be:
>>>>>>
>>>>>> linreg{T<:Number,S<:Number}(X::AbstractVector{T},
>>>>>> y::AbstractVector{S}) = [ones(T, size(X,1)) X] \ y
>>>>>>
>>>>>> which will allow it to work for X and y of different types.
>>>>>>
>>>>>> Is using the more specific typing (<: Number) the current best
>>>>> practices? I notice a lot of the methods in linalg/generics.jl just check
>>>>> for AbstractVectors etc without requiring numeric types even though that
>>>>> would be the more correct check.
>>>>>
>>>>>
>>>>>> A PR (pull request) with a patch and a test case would be most
>>>>>> welcome.
>>>>>>
>>>>> On it! Taking me a bit of sweat to figure out the build process and
>>>>> github stuff but once I have that all sorted a PR will be submitted. 
>>>>> Thanks
>>>>> for the quick response.
>>>>>
>>>>

Re: [julia-users] Re: linreg strangely picky about inputs

Reply via email to