+1 from me as well.

On Thu, Sep 30, 2010 at 2:45 PM, Jeff Eastman <[email protected]>wrote:

>  +1 Pedal-to-the-metal
>
>
> On 9/30/10 5:41 PM, Sean Owen (JIRA) wrote:
>
>>     [
>> https://issues.apache.org/jira/browse/MAHOUT-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916690#action_12916690]
>>
>> Sean Owen commented on MAHOUT-344:
>> ----------------------------------
>>
>> I'm gonna submit my flavor of Ankur's patches, with "new Random(11)" left
>> in place. We can iterate from there. Cool with all?
>>
>>  Minhash based clustering
>>> -------------------------
>>>
>>>                 Key: MAHOUT-344
>>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-344
>>>             Project: Mahout
>>>          Issue Type: Bug
>>>          Components: Clustering
>>>    Affects Versions: 0.3
>>>            Reporter: Ankur
>>>            Assignee: Ankur
>>>             Fix For: 0.4
>>>
>>>         Attachments: MAHOUT-344-v1.patch, MAHOUT-344-v2.patch,
>>> MAHOUT-344-v3.patch, MAHOUT-344-v4.patch, MAHOUT-344-v5.patch,
>>> MAHOUT-344-v6.patch, MAHOUT-344-v7.patch
>>>
>>>
>>> Minhash clustering performs probabilistic dimension reduction of high
>>> dimensional data. The essence of the technique is to hash each item using
>>> multiple independent hash functions such that the probability of collision
>>> of similar items is higher. Multiple such hash tables can then be
>>> constructed  to answer near neighbor type of queries efficiently.
>>>
>>
>

Reply via email to