[ 
https://issues.apache.org/jira/browse/MAHOUT-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751788#action_12751788
 ] 

Ted Dunning commented on MAHOUT-157:
------------------------------------


.bq Need a faster implementation of a bounded SortedSet. Currently use an 
encapsulated TreeSet where least value is removed after every insert call  if 
the max capacity is found to be exceeded. Another frequent operation is: merge 
two bounded SortedSets

This sounds more like a priority queue and a sorted set.  Since the priority 
queue doesn't need keep all elments in sorted order, it may be cheaper than a 
SortedSet.

On a separate note, it is common that the most impressive optimization for 
keeping top-n elements is to simply avoid inserting into the priority queue if 
you know that it will just get kicked out.  This can be done by looking at the 
score on the current lowest element.  The next step is to cache that lowest 
score.  This can often decrease the number of heap insertions by several orders 
of magnitude.

> Frequent Pattern Mining using Parallel FP-Growth
> ------------------------------------------------
>
>                 Key: MAHOUT-157
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-157
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.2
>            Reporter: Robin Anil
>             Fix For: 0.2
>
>         Attachments: MAHOUT-157-August-17.patch, MAHOUT-157-August-24.patch, 
> MAHOUT-157-August-31.patch, MAHOUT-157-August-6.patch, 
> MAHOUT-157-Combinations-BSD-License.patch, 
> MAHOUT-157-Combinations-BSD-License.patch, 
> MAHOUT-157-inProgress-August-5.patch, MAHOUT-157-September-5.patch
>
>
> Implement: http://infolab.stanford.edu/~echang/recsys08-69.pdf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to