[jira] [Commented] (MAHOUT-1464) RowSimilarityJob on Spark

Dmitriy Lyubimov (JIRA) Thu, 20 Mar 2014 01:49:26 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941520#comment-13941520
 ]


Dmitriy Lyubimov commented on MAHOUT-1464:
------------------------------------------

On Thu, Mar 20, 2014 at 1:42 AM, Sebastian Schelter (JIRA)

For that very reason, i almost always use SRM and almost never SM.

What i really would probably love is a sparse row and column block (hash
hanging from hash), this seems like recurring issue in blocking
calculations such as ALS. SRM does always that, except it uses full size
vector to hang sprase vectors off.




> RowSimilarityJob on Spark
> -------------------------
>
>                 Key: MAHOUT-1464
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1464
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.9
>         Environment: hadoop, spark
>            Reporter: Pat Ferrel
>              Labels: performance
>             Fix For: 0.9
>
>         Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch
>
>
> Create a version of RowSimilarityJob that runs on Spark. Ssc has a prototype 
> here: https://gist.github.com/sscdotopen/8314254. This should be compatible 
> with Mahout Spark DRM DSL so a DRM can be used as input. 
> Ideally this would extend to cover MAHOUT-1422 which is a feature request for 
> RSJ on two inputs to calculate the similarity of rows of one DRM with those 
> of another. This cross-similarity has several applications including 
> cross-action recommendations. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAHOUT-1464) RowSimilarityJob on Spark

Reply via email to