[
https://issues.apache.org/jira/browse/MAHOUT-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robin Anil resolved MAHOUT-1117.
--------------------------------
Resolution: Won't Fix
> Vectors are not hashable
> ------------------------
>
> Key: MAHOUT-1117
> URL: https://issues.apache.org/jira/browse/MAHOUT-1117
> Project: Mahout
> Issue Type: Improvement
> Affects Versions: 1.0
> Reporter: Dan Filimon
> Priority: Minor
>
> No *Vector classes (DenseVector, WeightedVector, etc.) implement hashCode().
> In working on improving clustering in Mahout, Ted Dunning wrote prototype
> code for Streaming KMeans and Ball KMeans, that I'm working with him on.
> These need to be used together in the MapReduce version.
> However, in Ball KMeans, we initialize the clusters using a probabilistic
> approach similar to k-means++. This however requires a
> Multinomial<WeightedVector> distribution of the points we want to cluster to
> pick the centroids.
> Internally, the Multinomial<T> uses a HashMap to keep track of the values it
> can sample from.
> Since Vectors don't override Object's hashCode(), it is possible to get the
> same value multiple times in the map (as long as the references differ).
> This is less of an issue because of how we're adding the vectors to the
> multinomial (we can guarantee that the references will be unique) and once
> MAHOUT-1116 is resolved the hashing will work okay for our needs.
> It still seems that it would be useful to have hashable vectors.
> What do you think? And what would a hash function look like?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira