[
https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939420#comment-13939420
]
Pat Ferrel commented on MAHOUT-1464:
------------------------------------
PDF in the repo is fine by me.
Can the patches just be branches in a git repo? That's really what a branch is
after all. When I make changes to Mahout I fork it, make changes in a branch
with apache/mahout as the upstream repo (you guys wouldn't even need to fork
Mahout, just stay in the branch). That would make it super easy for everyone to
follow changes to the "patch" by just pulling the latest from your branch.
With any luck these may be the FIRST Mahout jobs people use in a few years so I
wouldn't assume they are already familiar with the in-memory code, hadoop jobs,
the math literature, or R for that matter. Just saying that you may want
consider the future audience.
> RowSimilarityJob on Spark
> -------------------------
>
> Key: MAHOUT-1464
> URL: https://issues.apache.org/jira/browse/MAHOUT-1464
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.9
> Environment: hadoop, spark
> Reporter: Pat Ferrel
> Labels: performance
> Fix For: 0.9
>
> Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch
>
>
> Create a version of RowSimilarityJob that runs on Spark. Ssc has a prototype
> here: https://gist.github.com/sscdotopen/8314254. This should be compatible
> with Mahout Spark DRM DSL so a DRM can be used as input.
> Ideally this would extend to cover MAHOUT-1422 which is a feature request for
> RSJ on two inputs to calculate the similarity of rows of one DRM with those
> of another. This cross-similarity has several applications including
> cross-action recommendations.
--
This message was sent by Atlassian JIRA
(v6.2#6252)