[
https://issues.apache.org/jira/browse/MAHOUT-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917402#action_12917402
]
Dmitriy Lyubimov commented on MAHOUT-376:
-----------------------------------------
yes, I mean rank (Y-block) < (k+p) sometimes.
Ok. I don't know how often matrix A may be too sparse.
Just in case, i gave it a thought and here's what i think may help to account
for this.
It would seem that we can address that by keeping vector L of dimension k+p
where L[i]=# of blocks of Q where rank(Q-block)>i.
if B' is compiled in the same pass as B'=sum[ Q^t_(i*)A_(i*_)] then it just
means that for actual B we need to correct rows of B as B_(i*)=(1/L[i]) *
B'_(i*). Of course we don't actually have to correct them but just rather keep
in mind that B is defined not just by the data but also by this scaling vector
L. So subsequent steps may just account for it .
Of course, as an intermediate validation step, we check if any of L[i] is 0,
and if it is it pretty much means that rank(A)<k+p and we can't have a good svd
anyway so we will probably raise and exception in this case and ask to consider
to reduce oversampling or k. Or perhaps it is a bad case for distributed
computation anyway.
Right now i am just sending partial L vectors as q row with index -1 and sum it
up in combiner and reducer.
> Implement Map-reduce version of stochastic SVD
> ----------------------------------------------
>
> Key: MAHOUT-376
> URL: https://issues.apache.org/jira/browse/MAHOUT-376
> Project: Mahout
> Issue Type: Improvement
> Components: Math
> Reporter: Ted Dunning
> Assignee: Ted Dunning
> Fix For: 0.5
>
> Attachments: MAHOUT-376.patch, sd-bib.bib, sd.pdf, sd.tex, Stochastic
> SVD using eigensolver trick.pdf
>
>
> See attached pdf for outline of proposed method.
> All comments are welcome.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.