[ 
https://issues.apache.org/jira/browse/MAHOUT-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917010#action_12917010
 ] 

Dmitriy Lyubimov commented on MAHOUT-376:
-----------------------------------------

I have couple of doubts. I do amended Gram-Schmidt for the blocks of Y to 
produce blocks of Q, but while Q would end up orthonormal, i am not sure that Q 
and Y would end up spanning the same space. Although the fact that Y is random 
product means Q may also be more or less random basis so maybe it doesn't 
matter so much that span(Q)=exactly span(Y).

Second concern is still the situation when last split producted by MR doesn't 
have minimally sufficient k+p records of A for producing orthogonal Q. The 
ideal outcome is then just to add it to another split, but i can't figure an 
easy enough way to do that within MR framework (esp. if the input is serialized 
using compressed sequence file). one way is to do custom split indexing based 
on # of records encountered (similar to what that lzo MR project does). but it 
sounds too complicated to me. Another way is just to do a pre-pass over A and 
prepartition it the way that this condition is satisfied. Then have a custom 
split so that there's 1 mapper per partition. But that's still one additional 
preprocessing step which we'd make just for the sake of just a fraction of A. 
Ideas are welcome here.

> Implement Map-reduce version of stochastic SVD
> ----------------------------------------------
>
>                 Key: MAHOUT-376
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-376
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>             Fix For: 0.5
>
>         Attachments: MAHOUT-376.patch, sd-bib.bib, sd.pdf, sd.tex, Stochastic 
> SVD using eigensolver trick.pdf
>
>
> See attached pdf for outline of proposed method.
> All comments are welcome.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to