Please look at this survey [1]. I am currently reading thru it. Quite
comprehensive.

Where it fits us: ssvd, dssvd depend on QR accuracy quite a bit.

What we have right now in "new" algebra DSL is shown there as algorithm 9,
"Cholesky QR".

The benefit is that it is extremely easy and parallelizable. The problem
with that guy is that it is quite a bit more unstable compared to
Householder's or Given's (you can test it even with non-distributed
matrices, the difference quite apparent, especially in cases that start
apporaching singularity).

What we had in MR version was something close to what they describe as
TSQR. Our approach is detailed in Nathan Halko's dissertation (reference is
given on SSVD page). Our "older" MR version is much better in accuracy.

It is different from TSQR in quite a few things.

First, it is using reordered Given's QR for both reductions and blocking
QR. using givens QR for blocks allows for sequential processing of input.
Householder requires pretty much entire matrix loaded first.

Second, it develops algorithm for N-way merges instead of binary merges as
defined by TSQR. I haven't finish reading implementation details on TSQR,
but basically what it seems to mean is that TSQR requires dynamic
scheduling for merge tasks to stay at maximum parallelization efficiency.
N-way formulation allows to run more than 2 tasks before each reduction
step, hence, the depth of the tree could be much more shallow (log base N
instead of log base 2) of number of blocks.

There's also a CAQR algorithm there which i haven't even touched. I think
it is worth to read thru that.

Bottom line, what i think is that

(1) we need to implement a well parallelized variation of one of TSQR, CAQR
or Mahout MR N-way Givens QR as a more numerically stable alternative to
CholeskyQR (aka "thinQR" in the sparkbindings algebra);

(2) I think some of approaches in CAQR must lead to parallelization of
non-thin decompositions;

(3) they also consider approaches to distributed LU stuff which can open a
door to solving "non-thin" Ridge regressions and such.

Looking for takers and opinions on CAQR and Mahout's n-way Givens for the
sparkbindings.

[1] http://arxiv.org/pdf/0806.2159.pdf

Reply via email to