+1 from me as well. On Thu, Sep 30, 2010 at 2:45 PM, Jeff Eastman <[email protected]>wrote:
> +1 Pedal-to-the-metal > > > On 9/30/10 5:41 PM, Sean Owen (JIRA) wrote: > >> [ >> https://issues.apache.org/jira/browse/MAHOUT-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916690#action_12916690] >> >> Sean Owen commented on MAHOUT-344: >> ---------------------------------- >> >> I'm gonna submit my flavor of Ankur's patches, with "new Random(11)" left >> in place. We can iterate from there. Cool with all? >> >> Minhash based clustering >>> ------------------------- >>> >>> Key: MAHOUT-344 >>> URL: https://issues.apache.org/jira/browse/MAHOUT-344 >>> Project: Mahout >>> Issue Type: Bug >>> Components: Clustering >>> Affects Versions: 0.3 >>> Reporter: Ankur >>> Assignee: Ankur >>> Fix For: 0.4 >>> >>> Attachments: MAHOUT-344-v1.patch, MAHOUT-344-v2.patch, >>> MAHOUT-344-v3.patch, MAHOUT-344-v4.patch, MAHOUT-344-v5.patch, >>> MAHOUT-344-v6.patch, MAHOUT-344-v7.patch >>> >>> >>> Minhash clustering performs probabilistic dimension reduction of high >>> dimensional data. The essence of the technique is to hash each item using >>> multiple independent hash functions such that the probability of collision >>> of similar items is higher. Multiple such hash tables can then be >>> constructed to answer near neighbor type of queries efficiently. >>> >> >
