[ 
https://issues.apache.org/jira/browse/MAHOUT-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751923#action_12751923
 ] 

Ted Dunning commented on MAHOUT-157:
------------------------------------


Hmm...

You can get a significant improvement on the merge time because you may be able 
to insert in a different order which allows you to guess pretty accurates which 
half of the two queues you will not want to insert.  That is only  a (small) 
constant improvement, however.

I am still curious to know if you can even get a priority queue that has linear 
expected behavior.

What is the structure of the score?  Is it possible to store it as a fixed 
point number?  If so, then you can use radix sort to advantage.  Even without 
the integer nature, you should be able to break the score range into roughly 
equal sized chunks which are kept in unordered lists.  After some number of 
insertions, it will become clear that an entire chunk should be lost.  After a 
large number, you will lose enough chunks that you need to rechunk the score 
domain.  My guess is that the asymptotic behavior will not be much better 
unless you can be very clever about partial rechunking.

A final question is whether a dedicated priority queue that keeps the score as 
a primitive will help.  The benefits could be similar to those gained by 
primitive hashMaps.

  

> Frequent Pattern Mining using Parallel FP-Growth
> ------------------------------------------------
>
>                 Key: MAHOUT-157
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-157
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.2
>            Reporter: Robin Anil
>             Fix For: 0.2
>
>         Attachments: MAHOUT-157-August-17.patch, MAHOUT-157-August-24.patch, 
> MAHOUT-157-August-31.patch, MAHOUT-157-August-6.patch, 
> MAHOUT-157-Combinations-BSD-License.patch, 
> MAHOUT-157-Combinations-BSD-License.patch, 
> MAHOUT-157-inProgress-August-5.patch, MAHOUT-157-September-5.patch
>
>
> Implement: http://infolab.stanford.edu/~echang/recsys08-69.pdf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to