\ is supposed to use pivoted QR by default, which a standard way to solve 
least squares when X is not full rank. It gives me a LAPACKException on 0.3 
too, but it works for me on 0.4:

julia> X2 = [1 2 3;
             1 2 5
             1 2 -8
             2 4 23];

julia> y = [2.3, 5.6, 9.2, 10.5];

julia> X2\y
3-element Array{Float64,1}:
  1.28112 
  2.56223 
 -0.146502

You can explicitly tell Julia to solve using pivoted QR, which works on 0.3:

julia> qrfact(X2, pivot=true)\y # on Julia 0.4: qrfact(X2, Val{true})\y
3-element Array{Float64,1}:
  1.28112 
  2.56223 
 -0.146502

Or you can solve using SVD:

julia> svdfact(X2)\y
3-element Array{Float64,1}:
  1.28112 
  2.56223 
 -0.146502

Pivoted QR should be slightly faster than SVD, but both should be much 
faster than rref.

Simon

On Wednesday, September 16, 2015 at 10:02:49 AM UTC-4, Cedric St-Jean wrote:
>
> Sorry, "singular" was the wrong word. I meant that the matrix does not 
> have full rank: the columns are not linearly independent.
>
> # Linearly independent columns
> X1 = [1 2 3;
>       1 2 5
>       1 2 -8
>       1 4 23] 
> # Linearly dependent columns (4th row changed
> X2 = [1 2 3;
>       1 2 5
>       1 2 -8
>       2 4 23]
> y = [2.3, 5.6, 9.2, 10.5]
>
> X1 \ y
> >3-element Array{Float64,1}:
>
> > -8.18265 
> >  6.94133 
> > -0.394898
>
> X2 \ y
> >LAPACKException(2)
>
>
> This is because there is no unique solution to the linear regression 
> problem when the matrix does not have full rank. The usual solution is 
> regularization (eg. ridge regression), but selecting a linearly independent 
> subset of the columns seems like a valid strategy too, if only prediction 
> accuracy matters.
>
> Hence: isn't that a valid non-pedagogical use of rref?
>
> Cédric
>
> On Wednesday, September 16, 2015 at 12:13:55 AM UTC-4, Steven G. Johnson 
> wrote:
>>
>>
>>
>> On Tuesday, September 15, 2015 at 10:22:47 AM UTC-4, Cedric St-Jean wrote:
>>>
>>> In the discussion of issue 9804 
>>> <https://github.com/JuliaLang/julia/pull/9804#issuecomment-140404574>, 
>>> the general consensus seemed to be:
>>>
>>> > Right. "rref" has no use in real numerical coding, but every use in 
>>> pedagogy. 
>>>
>>>
>>>  I had a linear regression yield a PosDefError (the X matrix contains 
>>> only integers and was singular). I solved it by selecting a linearly 
>>> independent subset of columns 
>>> <http://math.stackexchange.com/questions/199125/how-to-remove-linearly-dependent-rows-cols>.
>>>  
>>> This uses the Row-Echelon-Form, rref. Is there a simpler way of doing 
>>> this that I'm missing?
>>>
>>
>> A \ x, where A is a non-square matrix, finds least-square solution via 
>> QR.  This is normally the "right" way to do regression.
>>
>> (You can also compute the QR factorization of A explicitly to get a rank 
>> estimate; google "rank revealing QR" for more info on this sort of thing.)
>>
>

Reply via email to