Matei, If the data has linearly dependent rows ALS should have a failback mechanism. Either remove the rows and then call BLAS posv or call BLAS gesv or Breeze QR decomposition.
I can share the analysis over email. Thanks. Deb On Thu, Mar 6, 2014 at 9:39 AM, Matei Zaharia <matei.zaha...@gmail.com>wrote: > Xt*X should mathematically always be positive semi-definite, so the only > way this might be bad is if it's not invertible due to linearly dependent > rows. This might happen due to the initialization or possibly due to > numerical issues, though it seems unlikely. Maybe it also happens if some > users rated no products, or something like that. > > Matei > > On Mar 6, 2014, at 7:59 AM, Sean Owen <so...@cloudera.com> wrote: > > > Hmm, Will Xt*X be positive definite in all cases? For example it's not > > if X has linearly independent rows? (I'm not going to guarantee 100% > > that I haven't missed something there.) > > > > Even though your data is huge, if it was generated by some synthetic > > process, maybe it is very low rank? > > > > QR decomposition is pretty good here, yes. > > -- > > Sean Owen | Director, Data Science | London > > > > > > On Thu, Mar 6, 2014 at 3:05 PM, Debasish Das <debasish.da...@gmail.com> > wrote: > >> Hi Sebastian, > >> > >> Yes Mahout ALS and Oryx runs fine on the same matrix because Sean calls > QR > >> decomposition. > >> > >> But the ALS objective should give us strictly positive definite > matrix..I > >> am thinking more on it.. > >> > >> There are some random factor assignment step but that also initializes > >> factors with normal(0,1)...which I think is not a big deal... > >> > >> About QR decomposition, jblas has Solve.solve and > >> Solve.solvePositive...solve should also run fine but it does a LU > >> factorization and better way will be to do a QR decomposition. > >> > >> Seems breeze has QR decomposition and we can make use of that...But QR > by > >> default is also not correct since if the matrix is positive definite > BLAS > >> psov (Solve.solvePositive) is much faster due to cholesky computation.. > >> > >> I believe we need a singular value check and based on that we should > call > >> solvePositive/solve or qr from breeze.. > >> > >> There is also a specialized version of TSQR (tall and skinny QR > >> decomposition) from Chris which might be good to evaluate as well: > >> > >> https://github.com/ccsevers/scalding-linalg > >> > >> I am going to debug it further and try to understand why I am getting > non > >> positive definite matrix and publish the findings... > >> > >> Any suggestions on how to proceed further on this ? Should I ask for a > PR > >> and we can discuss more on it ? > >> > >> Thanks. > >> Deb > >> > >> On Thu, Mar 6, 2014 at 6:47 AM, Sebastian Schelter <s...@apache.org> > wrote: > >> > >>> I'm not sure about the mathematical details, but I found in some > >>> experiments with Mahout that the matrix there was also not positive > >>> definite. Therefore, we chose QR decomposition to solve the linear > system. > >>> > >>> > >>> --sebastian > >>> > >>> > >>> On 03/06/2014 03:44 PM, Debasish Das wrote: > >>> > >>>> Hi, > >>>> > >>>> I am running ALS on a sparse problem (10M x 1M) and I am getting the > >>>> following error: > >>>> > >>>> org.jblas.exceptions.LapackArgumentException: LAPACK DPOSV: Leading > minor > >>>> of order i of A is not positive definite. > >>>> at org.jblas.SimpleBlas.posv(SimpleBlas.java:373) > >>>> at org.jblas.Solve.solvePositive(Solve.java:68) > >>>> > >>>> This error from blas shows up if the hessian matrix is not positive > >>>> definite... > >>>> > >>>> I checked that rating matrix is all > 0 but of course like netflix > they > >>>> are > >>>> not bounded within 1 and 5.... > >>>> > >>>> Is there some sort of specific initialization of factor matrices done > >>>> which > >>>> can make the hessian matrix non positive definite ? > >>>> > >>>> I am printing out the eigen vectors and fullXtX matrix to understand > it > >>>> more but any help will be appreciated. > >>>> > >>>> Thanks. > >>>> Deb > >>>> > >>>> > >>> > >