Hi Sean, I simply used the Solve.solve() method, I guess it uses a QR decomposition internally. I can provide a copy of the code if you want to have a look.
Best, Sebastian On 18.04.2013 22:56, Sean Owen wrote: > I'm always interested in optimizing the bit where you solve Ax=B which > I so recently went on about. That's a dense-matrix problem. Is there a > QR decomposition available? > > I tried getting this part to run on a GPU, and it worked, but wasn't > faster. Still somehow it was slower to push the smalish dense matrix > onto the card so many times per second. Same issue is identified here > so I'm interested to hear if this is a win by using the direct buffer > approach. > > On Thu, Apr 18, 2013 at 9:51 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: >> i've looked at jblas some time year or two ago. >> >> It's a fast bridge to LAPack and LAPack by far is hard to beat. But, I >> think i convinced myself it lacks support for sparse stuff. Which will work >> nice though still for many blockified algorithms such as ALS-WR with try to >> avoid doing blas level 3 operations on sparse data. >> >> >> On Thu, Apr 18, 2013 at 1:45 PM, Robin Anil <robin.a...@gmail.com> wrote: >> >>> BTW did this include the changes I made in the trunk recently? I would also >>> like to profile that code and see if we can squeeze out our Vectors and >>> Matrices more. Could you point me to how I can run the 1M example. >>> >>> Robin >>> >>> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc. >>> >>> >>> On Thu, Apr 18, 2013 at 3:43 PM, Robin Anil <robin.a...@gmail.com> wrote: >>> >>>> I was just emailing something similar on Mahout(See my email). I saw the >>>> TU Berlin name and I thought you would do something about it :) This is >>>> excellent. One of the next gen work on Vectors is maybe investigating >>> this. >>>> >>>> >>>> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc. >>>> >>>> >>>> On Thu, Apr 18, 2013 at 3:37 PM, Sebastian Schelter <s...@apache.org >>>> wrote: >>>> >>>>> Hi there, >>>>> >>>>> with regard to Robin mentioning JBlas [1] recently when we talked about >>>>> the performance of our vector operations, I ported the solving code for >>>>> ALS to JBlas today and got some awesome results. >>>>> >>>>> For the movielens 1M dataset and a factorization of rank 100, the >>>>> runtimes per iteration dropped from 50 seconds to less than 7 seconds. I >>>>> will run some tests with the distributed version and larger datasets in >>>>> the next days, but from what I've seen we should really take a closer >>>>> look at JBlas, at least for operations on dense matrices. >>>>> >>>>> Best, >>>>> Sebastian >>>>> >>>>> [1] http://mikiobraun.github.io/jblas/ >>>>> >>>> >>>> >>>