[ 
https://issues.apache.org/jira/browse/MAHOUT-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861171#action_12861171
 ] 

Ted Dunning commented on MAHOUT-305:
------------------------------------

{quote}
Ted says he ... doesn't like throwing out the low-count co-occurrences.

I agree, in the sense that low-count doesn't mean unimportant. It's something 
that LLR that figures out whether it's meaningless or contains a lot of info.
{quote}

Close.  But I would go further and say that on average individual data records 
that are high count are generally less useful than those with low counts and 
they are quadratically more expensive to deal with.  That combination of much 
higher expense and considerably lower value makes it seem to be a good idea to 
nuke (aka downsample) those records rather than lose the low count stuff.  

Dropping low count items in the combiner is even worse since there might have 
been quite a number scattered around that could have added up to interesting 
levels.



> Combine both cooccurrence-based CF M/R jobs
> -------------------------------------------
>
>                 Key: MAHOUT-305
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-305
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.2
>            Reporter: Sean Owen
>            Assignee: Ankur
>            Priority: Minor
>
> We have two different but essentially identical MapReduce jobs to make 
> recommendations based on item co-occurrence: 
> org.apache.mahout.cf.taste.hadoop.{item,cooccurrence}. They ought to be 
> merged. Not sure exactly how to approach that but noting this in JIRA, per 
> Ankur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to