[
https://issues.apache.org/jira/browse/MAHOUT-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751788#action_12751788
]
Ted Dunning commented on MAHOUT-157:
------------------------------------
.bq Need a faster implementation of a bounded SortedSet. Currently use an
encapsulated TreeSet where least value is removed after every insert call if
the max capacity is found to be exceeded. Another frequent operation is: merge
two bounded SortedSets
This sounds more like a priority queue and a sorted set. Since the priority
queue doesn't need keep all elments in sorted order, it may be cheaper than a
SortedSet.
On a separate note, it is common that the most impressive optimization for
keeping top-n elements is to simply avoid inserting into the priority queue if
you know that it will just get kicked out. This can be done by looking at the
score on the current lowest element. The next step is to cache that lowest
score. This can often decrease the number of heap insertions by several orders
of magnitude.
> Frequent Pattern Mining using Parallel FP-Growth
> ------------------------------------------------
>
> Key: MAHOUT-157
> URL: https://issues.apache.org/jira/browse/MAHOUT-157
> Project: Mahout
> Issue Type: New Feature
> Affects Versions: 0.2
> Reporter: Robin Anil
> Fix For: 0.2
>
> Attachments: MAHOUT-157-August-17.patch, MAHOUT-157-August-24.patch,
> MAHOUT-157-August-31.patch, MAHOUT-157-August-6.patch,
> MAHOUT-157-Combinations-BSD-License.patch,
> MAHOUT-157-Combinations-BSD-License.patch,
> MAHOUT-157-inProgress-August-5.patch, MAHOUT-157-September-5.patch
>
>
> Implement: http://infolab.stanford.edu/~echang/recsys08-69.pdf
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.