[
https://issues.apache.org/jira/browse/MAHOUT-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094114#comment-13094114
]
Dmitriy Lyubimov commented on MAHOUT-796:
-----------------------------------------
AFAIK distributed cache would actually do the same except it would also store
the file on disk.
The disadvantage here is that we add disk i/o time to this. The advantage is
that if we hit the same node with a mapper of the same task more than once, as
far as i understand, they'd have the entire B' locally. That's an interesting
idea, actually. But for big clusters where a job is unlikely to hit the same
node with more than 1 task, this probably would actually be detrimental. Plus,
if B is really big (somethin like 100Gb big) then we are requiring a lot of hdd
from a node.
Plus for jobs that use memory mapping or any sort of random access, distributed
cache is the only option -- but we don't need that.
Ok, let me make implementation that opens a stream first, just to prove/measure
whatever we are improving, and later perhaps there's a good sense to add an
option to use distributed cache for this. Maybe there will be another trick we
don't see to streamline this, but i so far did not find any. So it will give us
some time to think.
> Modified power iterations in existing SSVD code
> -----------------------------------------------
>
> Key: MAHOUT-796
> URL: https://issues.apache.org/jira/browse/MAHOUT-796
> Project: Mahout
> Issue Type: Improvement
> Components: Math
> Affects Versions: 0.5
> Reporter: Dmitriy Lyubimov
> Assignee: Dmitriy Lyubimov
> Labels: SSVD
> Fix For: 0.6
>
>
> Nathan Halko contacted me and pointed out importance of availability of power
> iterations and their significant effect on accuracy of smaller eigenvalues
> and noise attenuation.
> Essentially, we would like to introduce yet another job parameter, q, that
> governs amount of optional power iterations. The suggestion how to modify the
> algorithm is outlined here :
> https://github.com/dlyubimov/ssvd-lsi/wiki/Power-iterations-scratchpad .
> Note that it is different from original power iterations formula in the paper
> in the sense that additional orthogonalization performed after each
> iteration. Nathan points out that that improves errors in smaller eigenvalues
> a lot (If i interpret it right).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira