[ 
https://issues.apache.org/jira/browse/MAHOUT-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000665#comment-13000665
 ] 

Dmitriy Lyubimov commented on MAHOUT-593:
-----------------------------------------

{quote}My only question then is why those intermediate stages of 
Mappers/Reducers need to be exposed as stand-alone units ("Jobs" in your 
patch)? I agree they're not command-line "Jobs" that would be invoked 
independently, but they seem exposed that way.{quote}
I don't think they are exposed. here is the class diagram.

There's only one CLI entity (SSVDCLI) which is a Tool as well as AbstractJob. 
The SSVDCLI is basically a CLI adapter to the SSVDSolver API. SSVDSolver api 
can be used inline in a program as much as a regular solver (distinction is 
that DRM input is specified by a Hadoop glob expression).

SSVDSolver encapsulates overarching functionality of SSVD by driving map reduce 
jobs as well as small front-end computation (the latter by beans of 
instantiating an EigenSolverWrapper which solves BB'=UΛU' ). All this is 
completely isolated from either CLI or Solver api. The function of SSVDCli is 
parse and establish job specific parameters as well as Hadoop's Configuration. 
(Solver may override some of them when passing them on to jobs).

The idea here is that one might use it as embedded solver by using SSVDSolver, 
_or_ one might use command-line interface. But everything else is encapsulated 
and may change.

The overarching sequence enforced by solver is QtJob -> BBtJob -> BtJob -> 
front end eigen solution -> (optional VJob and optional UJob in parallel).

QtJob, VJob and UJob are map-only.



!ssvdclassdiag.png|height=700!

> Backport of Stochastic SVD patch (Mahout-376) to hadoop 0.20 to ensure 
> compatibility with current Mahout dependencies.
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-593
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-593
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.4
>            Reporter: Dmitriy Lyubimov
>             Fix For: 0.5
>
>         Attachments: MAHOUT-593.patch.gz, MAHOUT-593.patch.gz, 
> MAHOUT-593.patch.gz, SSVD-givens-CLI.pdf, ssvdclassdiag.png
>
>
> Current Mahout-376 patch requries 'new' hadoop API.  Certain elements of that 
> API (namely, multiple outputs) are not available in standard hadoop 0.20.2 
> release. As such, that may work only with either CDH or 0.21 distributions. 
>  In order to bring it into sync with current Mahout dependencies, a backport 
> of the patch to 'old' API is needed. 
> Also, some work is needed to resolve math dependencies. Existing patch relies 
> on apache commons-math 2.1 for eigen decomposition of small matrices. This 
> dependency is not currently set up in the mahout core. So, certain snippets 
> of code are either required to go to mahout-math or use Colt eigen 
> decompositon (last time i tried, my results were mixed with that one. It 
> seems to produce results inconsistent with those from mahout-math 
> eigensolver, at the very least, it doesn't produce singular values in sorted 
> order).
> So this patch is mainly moing some Mahout-376 code around.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to