[
https://issues.apache.org/jira/browse/MAHOUT-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751923#action_12751923
]
Ted Dunning commented on MAHOUT-157:
------------------------------------
Hmm...
You can get a significant improvement on the merge time because you may be able
to insert in a different order which allows you to guess pretty accurates which
half of the two queues you will not want to insert. That is only a (small)
constant improvement, however.
I am still curious to know if you can even get a priority queue that has linear
expected behavior.
What is the structure of the score? Is it possible to store it as a fixed
point number? If so, then you can use radix sort to advantage. Even without
the integer nature, you should be able to break the score range into roughly
equal sized chunks which are kept in unordered lists. After some number of
insertions, it will become clear that an entire chunk should be lost. After a
large number, you will lose enough chunks that you need to rechunk the score
domain. My guess is that the asymptotic behavior will not be much better
unless you can be very clever about partial rechunking.
A final question is whether a dedicated priority queue that keeps the score as
a primitive will help. The benefits could be similar to those gained by
primitive hashMaps.
> Frequent Pattern Mining using Parallel FP-Growth
> ------------------------------------------------
>
> Key: MAHOUT-157
> URL: https://issues.apache.org/jira/browse/MAHOUT-157
> Project: Mahout
> Issue Type: New Feature
> Affects Versions: 0.2
> Reporter: Robin Anil
> Fix For: 0.2
>
> Attachments: MAHOUT-157-August-17.patch, MAHOUT-157-August-24.patch,
> MAHOUT-157-August-31.patch, MAHOUT-157-August-6.patch,
> MAHOUT-157-Combinations-BSD-License.patch,
> MAHOUT-157-Combinations-BSD-License.patch,
> MAHOUT-157-inProgress-August-5.patch, MAHOUT-157-September-5.patch
>
>
> Implement: http://infolab.stanford.edu/~echang/recsys08-69.pdf
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.