[ 
https://issues.apache.org/jira/browse/MAHOUT-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977400#comment-13977400
 ] 

Pat Ferrel commented on MAHOUT-1518:
------------------------------------

Tell me how to avoid it.

1. Call the new type a "Table" we will need transpose (Excel calls it pivot), 
it will need rows to be clustered, you will need to extract rows and columns, 
you will need to compare columns of one with another or itself. These can be 
phrased in algebraic terms or "Table-ish" terms but they are fundamentally the 
same. 

2. Or we can support the algebraic formulations with the new type.

#1 requires a bunch of wrapper jobs, #2 requires the core math to deal with 
dictionaries. Either method is a pain but there is no doubt about the need. You 
pretty much convinced me of #1 now you say we shouldn't even do that?
 

> Preprocessing for collaborative filtering with the Scala DSL
> ------------------------------------------------------------
>
>                 Key: MAHOUT-1518
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1518
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Sebastian Schelter
>            Assignee: Sebastian Schelter
>             Fix For: 1.0
>
>         Attachments: MAHOUT-1518.patch
>
>
> The aim here is to provide some easy-to-use machinery to enable the usage of 
> the new Cooccurrence Analysis code from MAHOUT-1464 with datasets represented 
> as follows in a CSV file with the schema _timestamp, userId, itemId, action_, 
> e.g.
> {code}
> timestamp1, userIdString1, itemIdString1, “view"
> timestamp2, userIdString2, itemIdString1, “like"
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to