yes -- right -- this is built in the image of an R frame, we will need something similar. We will need a bit richer operations though aside from labeling (e.g. notion of missed values, vectorization with standardization, etc.)
On Tue, Apr 22, 2014 at 12:10 PM, Sebastian Schelter (JIRA) <[email protected] > wrote: > > [ > https://issues.apache.org/jira/browse/MAHOUT-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977222#comment-13977222] > > Sebastian Schelter commented on MAHOUT-1518: > -------------------------------------------- > > This is more or less a quick shot to give Pat what he wanted to test the > cooccurrence code in the new DSL. This is not and was not intended as a > general solution. > > But I hope that it shows what we need in terms of usability: We need a > datastructure that easily allows the users to load their data, even if it > does not have consecutive numeric ids or strings as key. From that, the > users need to be able to extract a DRM, run an algorithm and map the result > back to their original keys. > > This concept is also found in the MLTable and MLNumericTable proposed for > MLI in http://arxiv-web3.library.cornell.edu/pdf/1310.5426v2.pdf > > > Preprocessing for collaborative filtering with the Scala DSL > > ------------------------------------------------------------ > > > > Key: MAHOUT-1518 > > URL: https://issues.apache.org/jira/browse/MAHOUT-1518 > > Project: Mahout > > Issue Type: New Feature > > Components: Collaborative Filtering > > Reporter: Sebastian Schelter > > Assignee: Sebastian Schelter > > Fix For: 1.0 > > > > Attachments: MAHOUT-1518.patch > > > > > > The aim here is to provide some easy-to-use machinery to enable the > usage of the new Cooccurrence Analysis code from MAHOUT-1464 with datasets > represented as follows in a CSV file with the schema _timestamp, userId, > itemId, action_, e.g. > > {code} > > timestamp1, userIdString1, itemIdString1, “view" > > timestamp2, userIdString2, itemIdString1, “like" > > {code} > > > > -- > This message was sent by Atlassian JIRA > (v6.2#6252) >
