+1 Pedal-to-the-metal

On 9/30/10 5:41 PM, Sean Owen (JIRA) wrote:
     [ 
https://issues.apache.org/jira/browse/MAHOUT-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916690#action_12916690
 ]

Sean Owen commented on MAHOUT-344:
----------------------------------

I'm gonna submit my flavor of Ankur's patches, with "new Random(11)" left in 
place. We can iterate from there. Cool with all?

Minhash based clustering
-------------------------

                 Key: MAHOUT-344
                 URL: https://issues.apache.org/jira/browse/MAHOUT-344
             Project: Mahout
          Issue Type: Bug
          Components: Clustering
    Affects Versions: 0.3
            Reporter: Ankur
            Assignee: Ankur
             Fix For: 0.4

         Attachments: MAHOUT-344-v1.patch, MAHOUT-344-v2.patch, 
MAHOUT-344-v3.patch, MAHOUT-344-v4.patch, MAHOUT-344-v5.patch, 
MAHOUT-344-v6.patch, MAHOUT-344-v7.patch


Minhash clustering performs probabilistic dimension reduction of high 
dimensional data. The essence of the technique is to hash each item using 
multiple independent hash functions such that the probability of collision of 
similar items is higher. Multiple such hash tables can then be constructed  to 
answer near neighbor type of queries efficiently.

Reply via email to