[
https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939471#comment-13939471
]
Pat Ferrel commented on MAHOUT-1464:
------------------------------------
Since there are potentially commits by D and S around Spark, what's the best
way to track? I assume only RSJ is an issue since the rest will go to the trunk
if changed?
Sebastian, do you plan to use git or update patches on this issue?
Dimitriy, can you send me a link to your blog post. I assume you use something
like git diff to do the squashed patch, yes very helpful.
> RowSimilarityJob on Spark
> -------------------------
>
> Key: MAHOUT-1464
> URL: https://issues.apache.org/jira/browse/MAHOUT-1464
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.9
> Environment: hadoop, spark
> Reporter: Pat Ferrel
> Labels: performance
> Fix For: 0.9
>
> Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch
>
>
> Create a version of RowSimilarityJob that runs on Spark. Ssc has a prototype
> here: https://gist.github.com/sscdotopen/8314254. This should be compatible
> with Mahout Spark DRM DSL so a DRM can be used as input.
> Ideally this would extend to cover MAHOUT-1422 which is a feature request for
> RSJ on two inputs to calculate the similarity of rows of one DRM with those
> of another. This cross-similarity has several applications including
> cross-action recommendations.
--
This message was sent by Atlassian JIRA
(v6.2#6252)