[ 
https://issues.apache.org/jira/browse/MAHOUT-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975159#comment-13975159
 ] 

Pat Ferrel commented on MAHOUT-1518:
------------------------------------

This is excellent!

I've been thinking about this issue a lot lately and think there is a good case 
for doing this kind of pre/post processing, call it import/export, in a general 
way for several of the more blackbox algorithms in Mahout like the cooccurrence 
code, recommenders, clustering etc.

I have in mind a set of drivers that put data in Mahout format, then calls the 
correct code to get things executed. Then at the end in a post process puts 
things back into the format the user wants.  It even allows for writing to 
databases. I have a short description of the idea on github 
[https://github.com/pferrel/HarnessML/wiki/HarnessML]. The code will probably 
provide a CLI and be written in Scala to interface with Mahout 2. The 
cooccurrence code will be the first to go in so this patch is very helpful 
indeed.

Didn't expect you to do all this so I'll work on it with any input like this 
patch greatly appreciated. If you are planning to take this on, let me know so 
I won't duplicate efforts.

> Preprocessing for collaborative filtering with the Scala DSL
> ------------------------------------------------------------
>
>                 Key: MAHOUT-1518
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1518
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Sebastian Schelter
>            Assignee: Sebastian Schelter
>             Fix For: 1.0
>
>         Attachments: MAHOUT-1518.patch
>
>
> The aim here is to provide some easy-to-use machinery to enable the usage of 
> the new Cooccurrence Analysis code from MAHOUT-1464 with datasets represented 
> as follows in a CSV file with the schema _timestamp, userId, itemId, action_, 
> e.g.
> {code}
> timestamp1, userIdString1, itemIdString1, “view"
> timestamp2, userIdString2, itemIdString1, “like"
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to