[
https://issues.apache.org/jira/browse/MAHOUT-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975159#comment-13975159
]
Pat Ferrel commented on MAHOUT-1518:
------------------------------------
This is excellent!
I've been thinking about this issue a lot lately and think there is a good case
for doing this kind of pre/post processing, call it import/export, in a general
way for several of the more blackbox algorithms in Mahout like the cooccurrence
code, recommenders, clustering etc.
I have in mind a set of drivers that put data in Mahout format, then calls the
correct code to get things executed. Then at the end in a post process puts
things back into the format the user wants. It even allows for writing to
databases. I have a short description of the idea on github
[https://github.com/pferrel/HarnessML/wiki/HarnessML]. The code will probably
provide a CLI and be written in Scala to interface with Mahout 2. The
cooccurrence code will be the first to go in so this patch is very helpful
indeed.
Didn't expect you to do all this so I'll work on it with any input like this
patch greatly appreciated. If you are planning to take this on, let me know so
I won't duplicate efforts.
> Preprocessing for collaborative filtering with the Scala DSL
> ------------------------------------------------------------
>
> Key: MAHOUT-1518
> URL: https://issues.apache.org/jira/browse/MAHOUT-1518
> Project: Mahout
> Issue Type: New Feature
> Components: Collaborative Filtering
> Reporter: Sebastian Schelter
> Assignee: Sebastian Schelter
> Fix For: 1.0
>
> Attachments: MAHOUT-1518.patch
>
>
> The aim here is to provide some easy-to-use machinery to enable the usage of
> the new Cooccurrence Analysis code from MAHOUT-1464 with datasets represented
> as follows in a CSV file with the schema _timestamp, userId, itemId, action_,
> e.g.
> {code}
> timestamp1, userIdString1, itemIdString1, “view"
> timestamp2, userIdString2, itemIdString1, “like"
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)