[ 
https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938333#comment-13938333
 ] 

Pat Ferrel commented on MAHOUT-1464:
------------------------------------

OK, refreshed the repo and now I see all the Spark/Scala stuff.
 
Not sure what you mean by "standalone in the cluster"? Just getting up to speed 
on Spark and they describe integrating with hadoop. I was asking because I need 
to set up a clustered environment and really only have one cluster so Spark and 
Hadoop will coexist on the the same machines. I'll probably stay on Hadoop 
1.2.1 as Dimitriy suggests.

So you plan to add to D's PDF doc? If so maybe it would be good to check it in 
or add it to the Github wiki or cwiki. Even if it's only a link to the PDF we'd 
all know where to look for the latest.


> RowSimilarityJob on Spark
> -------------------------
>
>                 Key: MAHOUT-1464
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1464
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.9
>         Environment: hadoop, spark
>            Reporter: Pat Ferrel
>              Labels: performance
>             Fix For: 0.9
>
>
> Create a version of RowSimilarityJob that runs on Spark. Ssc has a prototype 
> here: https://gist.github.com/sscdotopen/8314254. This should be compatible 
> with Mahout Spark DRM DSL so a DRM can be used as input. 
> Ideally this would extend to cover MAHOUT-1422 which is a feature request for 
> RSJ on two inputs to calculate the similarity of rows of one DRM with those 
> of another. This cross-similarity has several applications including 
> cross-action recommendations. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to