[ 
https://issues.apache.org/jira/browse/MAHOUT-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860939#action_12860939
 ] 

Ankur commented on MAHOUT-305:
------------------------------

> But the answer is the partitioner ?
Yes

> Am I right that (item1, item2) ->count is all that's needed ?
Yes

> And why is the priority queue needed ...

You could use both a co-occurrence count (your favorite) and max number 
co-occurrent pair (say 1000). I have chosen a size 100. So for any given item 
the top-100 co-occurrent items (by count) would be output. Though the size is 
limited with this it still can cause explosion if there are very long 
histories. From netflix dataset recall the users who have rated more than 10K 
movies. So one way of taking care of them is to apply 'sessionization' i.e. 
output a co-occurrence pair only if they are part of a session or satisfy some 
other constraint. But that is not implemented yet. 

> TupleWritable ...
Not really. I have a specialized implementation for my own purpose using 
GenericWritable that wraps each object of TupleWritable.


> Combine both cooccurrence-based CF M/R jobs
> -------------------------------------------
>
>                 Key: MAHOUT-305
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-305
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.2
>            Reporter: Sean Owen
>            Assignee: Ankur
>            Priority: Minor
>
> We have two different but essentially identical MapReduce jobs to make 
> recommendations based on item co-occurrence: 
> org.apache.mahout.cf.taste.hadoop.{item,cooccurrence}. They ought to be 
> merged. Not sure exactly how to approach that but noting this in JIRA, per 
> Ankur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to