[
https://issues.apache.org/jira/browse/MAHOUT-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860939#action_12860939
]
Ankur commented on MAHOUT-305:
------------------------------
> But the answer is the partitioner ?
Yes
> Am I right that (item1, item2) ->count is all that's needed ?
Yes
> And why is the priority queue needed ...
You could use both a co-occurrence count (your favorite) and max number
co-occurrent pair (say 1000). I have chosen a size 100. So for any given item
the top-100 co-occurrent items (by count) would be output. Though the size is
limited with this it still can cause explosion if there are very long
histories. From netflix dataset recall the users who have rated more than 10K
movies. So one way of taking care of them is to apply 'sessionization' i.e.
output a co-occurrence pair only if they are part of a session or satisfy some
other constraint. But that is not implemented yet.
> TupleWritable ...
Not really. I have a specialized implementation for my own purpose using
GenericWritable that wraps each object of TupleWritable.
> Combine both cooccurrence-based CF M/R jobs
> -------------------------------------------
>
> Key: MAHOUT-305
> URL: https://issues.apache.org/jira/browse/MAHOUT-305
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.2
> Reporter: Sean Owen
> Assignee: Ankur
> Priority: Minor
>
> We have two different but essentially identical MapReduce jobs to make
> recommendations based on item co-occurrence:
> org.apache.mahout.cf.taste.hadoop.{item,cooccurrence}. They ought to be
> merged. Not sure exactly how to approach that but noting this in JIRA, per
> Ankur.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.