[jira] [Commented] (MAHOUT-1529) Finalize abstraction of distributed logical plans from backend operations

Dmitriy Lyubimov (JIRA) Tue, 20 May 2014 10:47:46 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14003727#comment-14003727
 ]


Dmitriy Lyubimov commented on MAHOUT-1529:
------------------------------------------

DRM is legacy Mahout format inherited from all map reduce solvers. 

Perhaps one of the most popular commands, `seq2sparse`, produces string keys 
(full document path name in the original corpus). A lot of solvers are agnostic 
propagators of the keys: SSVD -> U, both MR and DSL versions, so is DSPCA, 
thinQR, and (I think) current and future versions of factorizes such as ALS. 
For more examples of what key can be, see "Mahout In Action" -- or bug the 
authors. Going forward, i am very likely internally use a more involved object 
structures as a key payload.

I honestly don't see value in a separate "local" backend as Spark already 
provides one. It is very unlikely to be used.

Tuple definitions don't depend on Spark, at this point i don't see a reason to 
make them engine-specific.






> Finalize abstraction of distributed logical plans from backend operations
> -------------------------------------------------------------------------
>
>                 Key: MAHOUT-1529
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1529
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Dmitriy Lyubimov
>             Fix For: 1.0
>
>
> We have a few situations when algorithm-facing API has Spark dependencies 
> creeping in. 
> In particular, we know of the following cases:
> -(1) checkpoint() accepts Spark constant StorageLevel directly;-
> (2) certain things in CheckpointedDRM;
> (3) drmParallelize etc. routines in the "drm" and "sparkbindings" package. 
> (5) drmBroadcast returns a Spark-specific Broadcast object
> *Current tracker:* 
> https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-1529.
> *Pull requests are welcome*.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAHOUT-1529) Finalize abstraction of distributed logical plans from backend operations

Reply via email to