[
https://issues.apache.org/jira/browse/MAHOUT-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13672834#comment-13672834
]
Suneel Marthi commented on MAHOUT-1052:
---------------------------------------
This patch can be committed to trunk (as part of 0.8 release). Cleaned up the
patch to be in sync with present codebase.
> Add an option to MinHashDriver that specifies the dimension of vector to hash
> (indexes or values)
> -------------------------------------------------------------------------------------------------
>
> Key: MAHOUT-1052
> URL: https://issues.apache.org/jira/browse/MAHOUT-1052
> Project: Mahout
> Issue Type: Improvement
> Components: Clustering
> Affects Versions: 0.6
> Reporter: Elena Smirnova
> Assignee: Suneel Marthi
> Priority: Minor
> Labels: minhash
> Fix For: Backlog
>
> Attachments: MAHOUT-1052.patch
>
>
> Add a parameter to MinHash clustering that specifies the dimension of vector
> to hash (indexes or values). Current version of MinHash clustering only
> hashed values of vectors. Based on discussion on dev-mahout list, both of the
> use-cases are possible and frequently met in practice.
> Preserve backward compatibility with default dimension set to values. Add new
> unit tests.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira