[jira] Commented: (MAHOUT-376) Implement Map-reduce version of stochastic SVD

Dmitriy Lyubimov (JIRA) Sun, 03 Oct 2010 14:13:03 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917402#action_12917402
 ]


Dmitriy Lyubimov commented on MAHOUT-376:
-----------------------------------------

yes,  I mean rank (Y-block) < (k+p) sometimes. 

Ok. I don't know how often matrix A may be too sparse. 

Just in case, i gave it a thought and here's what i think may help to account 
for this. 

It would seem that we can address that by keeping vector L of dimension k+p 
where L[i]=# of blocks of Q where rank(Q-block)>i. 

if B' is compiled in the same pass as B'=sum[ Q^t_(i*)A_(i*_)] then it just 
means that for actual B we need to correct rows of B as B_(i*)=(1/L[i]) * 
B'_(i*). Of course we don't actually have to correct them but just rather keep 
in mind that B is defined not just by the data but also by this scaling vector 
L. So subsequent steps may just account for it . 

Of course, as an intermediate validation step, we check if any of L[i] is 0, 
and if it is it pretty much means that rank(A)<k+p and we can't have a good svd 
anyway so we will probably raise and exception in this case and ask to consider 
to reduce oversampling or k. Or perhaps it is a bad case for distributed 
computation anyway.

Right now i am just sending partial L vectors as q row with index -1 and sum it 
up in combiner and reducer.


> Implement Map-reduce version of stochastic SVD
> ----------------------------------------------
>
>                 Key: MAHOUT-376
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-376
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>             Fix For: 0.5
>
>         Attachments: MAHOUT-376.patch, sd-bib.bib, sd.pdf, sd.tex, Stochastic 
> SVD using eigensolver trick.pdf
>
>
> See attached pdf for outline of proposed method.
> All comments are welcome.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-376) Implement Map-reduce version of stochastic SVD

Reply via email to