[ 
https://issues.apache.org/jira/browse/MAHOUT-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977222#comment-13977222
 ] 

Sebastian Schelter commented on MAHOUT-1518:
--------------------------------------------

This is more or less a quick shot to give Pat what he wanted to test the 
cooccurrence code in the new DSL. This is not and was not intended as a general 
solution. 

But I hope that it shows what we need in terms of usability: We need a 
datastructure that easily allows the users to load their data, even if it does 
not have consecutive numeric ids or strings as key. From that, the users need 
to be able to extract a DRM, run an algorithm and map the result back to their 
original keys. 

This concept is also found in the MLTable and MLNumericTable proposed for MLI 
in http://arxiv-web3.library.cornell.edu/pdf/1310.5426v2.pdf

> Preprocessing for collaborative filtering with the Scala DSL
> ------------------------------------------------------------
>
>                 Key: MAHOUT-1518
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1518
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Sebastian Schelter
>            Assignee: Sebastian Schelter
>             Fix For: 1.0
>
>         Attachments: MAHOUT-1518.patch
>
>
> The aim here is to provide some easy-to-use machinery to enable the usage of 
> the new Cooccurrence Analysis code from MAHOUT-1464 with datasets represented 
> as follows in a CSV file with the schema _timestamp, userId, itemId, action_, 
> e.g.
> {code}
> timestamp1, userIdString1, itemIdString1, “view"
> timestamp2, userIdString2, itemIdString1, “like"
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to