Space: Apache Mahout (https://cwiki.apache.org/confluence/display/MAHOUT)
Page: Minhash Clustering 
(https://cwiki.apache.org/confluence/display/MAHOUT/Minhash+Clustering)


Edited by Lance Norskog:
---------------------------------------------------------------------
Minhash clustering performs probabilistic dimension reduction of high 
dimensional data. The essence of the technique is to hash each item using 
multiple independent hash functions such that the probability of collision of 
similar items is higher. Multiple such hash tables can then be constructed to 
answer near neighbor types of queries efficiently.

There is a MinHashDriver class which works in the TestMinHashClustering unit 
test. This is not included in the standard driver.props class and is thus not 
available as a 'bin/mahout' command-line job.


Change your notification preferences: 
https://cwiki.apache.org/confluence/users/viewnotifications.action    

Reply via email to