[
https://issues.apache.org/jira/browse/MAHOUT-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917010#action_12917010
]
Dmitriy Lyubimov commented on MAHOUT-376:
-----------------------------------------
I have couple of doubts. I do amended Gram-Schmidt for the blocks of Y to
produce blocks of Q, but while Q would end up orthonormal, i am not sure that Q
and Y would end up spanning the same space. Although the fact that Y is random
product means Q may also be more or less random basis so maybe it doesn't
matter so much that span(Q)=exactly span(Y).
Second concern is still the situation when last split producted by MR doesn't
have minimally sufficient k+p records of A for producing orthogonal Q. The
ideal outcome is then just to add it to another split, but i can't figure an
easy enough way to do that within MR framework (esp. if the input is serialized
using compressed sequence file). one way is to do custom split indexing based
on # of records encountered (similar to what that lzo MR project does). but it
sounds too complicated to me. Another way is just to do a pre-pass over A and
prepartition it the way that this condition is satisfied. Then have a custom
split so that there's 1 mapper per partition. But that's still one additional
preprocessing step which we'd make just for the sake of just a fraction of A.
Ideas are welcome here.
> Implement Map-reduce version of stochastic SVD
> ----------------------------------------------
>
> Key: MAHOUT-376
> URL: https://issues.apache.org/jira/browse/MAHOUT-376
> Project: Mahout
> Issue Type: Improvement
> Components: Math
> Reporter: Ted Dunning
> Assignee: Ted Dunning
> Fix For: 0.5
>
> Attachments: MAHOUT-376.patch, sd-bib.bib, sd.pdf, sd.tex, Stochastic
> SVD using eigensolver trick.pdf
>
>
> See attached pdf for outline of proposed method.
> All comments are welcome.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.